misterbear misterbear - 6 months ago 20
Python Question

How to copy multiple XML nodes to another file in Python

Bare in mind I am very new to Python. I'm trying to copy few XML nodes from sample1.xml to out.xml if it doesn't exist in sample2.xml.

this is how far I got before I'm stuck

import xml.etree.ElementTree as ET

tree = ET.ElementTree(file='sample1.xml')
addtree = ET.ElementTree(file='sample2.xml')

root = tree.getroot()
addroot = addtree.getroot()

for adel in addroot.findall('.//cars/car'):
for el in root.findall('cars/car'):
with open('out.xml', 'w+') as f:
f.write("BEFORE\n")
f.write(el.tag)
f.write("\n")
f.write(adel.tag)
f.write("\n")
f.write("\n")

f.write("AFTER\n")

el = adel

f.write(el.tag)
f.write("\n")
f.write(adel.tag)


I have no idea what I'm missing, but it's only copying the actual "
tag
" itself.

outputs this:

BEFORE
car
car

AFTER
car
car


So I'm missing the children nodes, and also the
<
,
>
,
</
,
>
tags. Expected result is below.

sample1.xml:

<cars>
<car>
<use-car>0</use-car>
<use-gas>0</use-gas>
<car-name />
<car-key />
<car-location>hawaii</car-location>
<car-port>5</car-port>
</car>
</cars>


sample2.xml:

<cars>
<old>
1
</old>
<new>
8
</new>
<car />
</cars>


expected result in out.xml (final product)

<cars>
<old>
1
</old>
<new>
8
</old>
<car>
<use-car>0</use-car>
<use-gas>0</use-gas>
<car-name />
<car-key />
<car-location>hawaii</car-location>
<car-port>5</car-port>
</car>
</cars>


All the other nodes
old
and
new
must remain untouched. I'm just trying to replace
<car />
with all its children and grandchildren (if existed) nodes.

Answer

First, a couple of trivial issues with your XML:

  • sample1: The closing cars tag is missing a /
  • sample2: The closing new tag incorrectly reads old, should read new

Second, a disclaimer: my solution below has its limitations - in particular, it wouldn't handle repeatedly substituting the car node from sample1 into multiple spots in sample2. But it works fine for the sample files you've supplied.

Third: thanks to the top couple of answers on access ElementTree node parent node - they informed the implementation of get_node_parent_info below.

Finally, the code:

import xml.etree.ElementTree as ET

def find_child(node, with_name):
    """Recursively find node with given name"""
    for element in list(node):
        if element.tag == with_name:
            return element
        elif list(element):
            sub_result = find_child(element, with_name)
            if sub_result is not None:
                return sub_result
    return None

def replace_node(from_tree, to_tree, node_name):
    """
    Replace node with given node_name in to_tree with
    the same-named node from the from_tree
    """
    # Find nodes of given name ('car' in the example) in each tree
    from_node = find_child(from_tree.getroot(), node_name)
    to_node = find_child(to_tree.getroot(), node_name)

    # Find where to substitute the from_node into the to_tree
    to_parent, to_index = get_node_parent_info(to_tree, to_node)

    # Replace to_node with from_node
    to_parent.remove(to_node)
    to_parent.insert(to_index, from_node)

def get_node_parent_info(tree, node):
    """
    Return tuple of (parent, index) where:
        parent = node's parent within tree
        index = index of node under parent
    """
    parent_map = {c:p for p in tree.iter() for c in p}
    parent = parent_map[node]
    return parent, list(parent).index(node)

from_tree = ET.ElementTree(file='sample1.xml')
to_tree = ET.ElementTree(file='sample2.xml')

replace_node(from_tree, to_tree, 'car')

# ET.dump(to_tree)
to_tree.write('output.xml')

UPDATE: It was recently brought to my attention that the implementation of find_child() in the solution I originally supplied would fail if the "child" in question was not in the first branch of the XML tree that was traversed. I've updated the implementation above to rectify this.

Comments