E Bobrov E Bobrov - 5 months ago 38
Python Question

How to produce xml structure up to certain xml node ussing Python?

I am new for this stuff. Because of my original xml is about 8GB it is hard to explore all parents, grandparents, grandgrandparents, etc. for the interested child in the original xml manually. I am trying to look through all the nodes until interested child is found. So I want to create "skeleton" structure of xml upto the interested child of country_data.xml from here https://docs.python.org/2/library/xml.etree.elementtree.html. Sorry for the code:

def LookThrougStructure(parent, xpath_str, stop_flag):
out_str.write('Parent tag: %s\n' % (parent.tag))
for child in parent:
if child.tag == my_tag:
out_str.write('Child tag: %s\n' % (child.tag))
#my_node_is_found_flag = 1
break
LookThrougStructure(child, child.tag, 0)
return
import xml.etree.ElementTree as ET
tree = ET.parse('country_data.xml')
root = tree.getroot()
my_tag = 'neighbor'
out_str = open('xml_structure.txt', 'w')
LookThrougStructure(root, root.tag, my_tag)
out_str.close()


It works wrong and yelds all node tags:


Parent tag: data Parent tag: country Parent tag: rank Parent tag: year
Parent tag: gdppc Child tag: neighbor Parent tag: country Parent tag:
rank Parent tag: year Parent tag: gdppc Child tag: neighbor Parent
tag: country Parent tag: rank Parent tag: year Parent tag: gdppc Child
tag: neighbor


But I want something like that (my interested child is"neighbor"):


  • data


    • country


      • neighbor





Or that: /data/country/neighbor.
What is wrong?

Answer

If I understand you correctly you want something like:

look_through_structure(parent, my_tag):
    for node in parent.iter("*"):
        out_str.write('Parent tag: %s\n' % node.tag)
        for nxt in node:
            if nxt.tag == my_tag:
                out_str.write('child tag: %s\n' % my_tag)
                return
            out_str.write('Parent tag: %s\n' % nxt.tag)
            if any(ch.tag == my_tag for ch in nxt.getchildren()):
                out_str.write('child tag: %s\n' % my_tag)
                return

If we change the function a bit and yield the tags:

def look_through_structure(parent, my_tag):
    for node in parent.iter("*"):
        yield node.tag
        for nxt in node:
            if nxt.tag == my_tag:
                yield nxt.tag
                return
            yield nxt.tag
            if any(ch.tag == my_tag for ch in nxt.getchildren()):
                yield my_tag
                return

And run it on the file:

In [24]: root = tree.getroot()

In [25]: my_tag = 'neighbor'

In [26]: list(look_through_structure(root, my_tag))
Out[26]: ['data', 'country', 'neighbor']

Also if you just wanted the full path, lxml's getpath would do that for you:

import lxml.etree as ET

tree = ET.parse('country.xml')

my_tag = 'neighbor'

print(tree.getpath(tree.find(".//neighbor")))

Output:

/data/country[1]/neighbor[1]