David542 David542 - 3 months ago 12
Python Question

How to get path of all elements in lxml with attribute

I have the following code:

tree = etree.ElementTree(new_xml)
for e in new_xml.iter():
print tree.getpath(e), e.text


This will give me something like the following:

/Item/Purchases

/Item/Purchases/Purchase[1]
/Item/Purchases/Purchase[1]/URL http://tvgo.xfinity.com/watch/x/6091165185315991112/movies
/Item/Purchases/Purchase[1]/Rating R

/Item/Purchases/Purchase[2]
/Item/Purchases/Purchase[2]/URL http://tvgo.xfinity.com/watch/x/6091165185315991112/movies
/Item/Purchases/Purchase[2]/Rating R


However, I need to get the path not of the list element but of the attribute. Here is what the xml looks like:

<Item>
<Purchases>
<Purchase Country="US">
<URL>http://tvgo.xfinity.com/watch/x/6091165US</URL>
<Rating>R</Rating>
</Purchase>
<Purchase Country="CA">
<URL>http://tvgo.xfinity.com/watch/x/6091165CA</URL>
<Rating>R</Rating>
</Purchase>
</Item>


How would I get the following path instead?

/Item/Purchases

/Item/Purchases/Purchase[@Country="US"]
/Item/Purchases/Purchase[@Country="US"]/URL http://tvgo.xfinity.com/watch/x/6091165185315991112/movies
/Item/Purchases/Purchase[@Country="US"]/Rating R

/Item/Purchases/Purchase[@Country="CA"]
/Item/Purchases/Purchase[@Country="CA"]/URL http://tvgo.xfinity.com/watch/x/6091165185315991112/movies
/Item/Purchases/Purchase[@Country="CA"]/Rating R

Answer

Not pretty, but it does the job.

replacements = {}

for e in tree.iter():
    path = tree.getpath(e)

    if re.search('/Purchase\[\d+\]$', path):
        new_predicate = '[@Country="' + e.attrib['Country'] + '"]'
        new_path = re.sub('\[\d+\]$', new_predicate, path)
        replacements[path] = new_path

    for key, replacement in replacements.iteritems():
        path = path.replace(key, replacement)

    print path, e.text.strip()

prints this for me:

/Item 
/Item/Purchases 
/Item/Purchases/Purchase[@Country="US"] 
/Item/Purchases/Purchase[@Country="US"]/URL http://tvgo.xfinity.com/watch/x/6091165US
/Item/Purchases/Purchase[@Country="US"]/Rating R
/Item/Purchases/Purchase[@Country="CA"] 
/Item/Purchases/Purchase[@Country="CA"]/URL http://tvgo.xfinity.com/watch/x/6091165CA
/Item/Purchases/Purchase[@Country="CA"]/Rating R