misterbear misterbear - 11 months ago 39
Python Question

How to copy multiple XML nodes to another file in Python

Bare in mind I am very new to Python. I'm trying to copy few XML nodes from sample1.xml to out.xml if it doesn't exist in sample2.xml.

this is how far I got before I'm stuck

import xml.etree.ElementTree as ET

tree = ET.ElementTree(file='sample1.xml')
addtree = ET.ElementTree(file='sample2.xml')

root = tree.getroot()
addroot = addtree.getroot()

for adel in addroot.findall('.//cars/car'):
for el in root.findall('cars/car'):
with open('out.xml', 'w+') as f:
f.write("BEFORE\n")
f.write(el.tag)
f.write("\n")
f.write(adel.tag)
f.write("\n")
f.write("\n")

f.write("AFTER\n")

el = adel

f.write(el.tag)
f.write("\n")
f.write(adel.tag)


I have no idea what I'm missing, but it's only copying the actual "
tag
" itself.

outputs this:

BEFORE
car
car

AFTER
car
car


So I'm missing the children nodes, and also the
<
,
>
,
</
,
>
tags. Expected result is below.

sample1.xml:

<cars>
<car>
<use-car>0</use-car>
<use-gas>0</use-gas>
<car-name />
<car-key />
<car-location>hawaii</car-location>
<car-port>5</car-port>
</car>
</cars>


sample2.xml:

<cars>
<old>
1
</old>
<new>
8
</new>
<car />
</cars>


expected result in out.xml (final product)

<cars>
<old>
1
</old>
<new>
8
</old>
<car>
<use-car>0</use-car>
<use-gas>0</use-gas>
<car-name />
<car-key />
<car-location>hawaii</car-location>
<car-port>5</car-port>
</car>
</cars>


All the other nodes
old
and
new
must remain untouched. I'm just trying to replace
<car />
with all its children and grandchildren (if existed) nodes.

Answer

First, a couple of trivial issues with your XML:

  • sample1: The closing cars tag is missing a /
  • sample2: The closing new tag incorrectly reads old, should read new

Second, a disclaimer: my solution below has its limitations - in particular, it wouldn't handle repeatedly substituting the car node from sample1 into multiple spots in sample2. But it works fine for the sample files you've supplied.

Third: thanks to the top couple of answers on access ElementTree node parent node - they informed the implementation of get_node_parent_info below.

Finally, the code:

import xml.etree.ElementTree as ET

def find_child(node, with_name):
    """Recursively find node with given name"""
    for element in list(node):
        if element.tag == with_name:
            return element
        elif list(element):
            sub_result = find_child(element, with_name)
            if sub_result is not None:
                return sub_result
    return None

def replace_node(from_tree, to_tree, node_name):
    """
    Replace node with given node_name in to_tree with
    the same-named node from the from_tree
    """
    # Find nodes of given name ('car' in the example) in each tree
    from_node = find_child(from_tree.getroot(), node_name)
    to_node = find_child(to_tree.getroot(), node_name)

    # Find where to substitute the from_node into the to_tree
    to_parent, to_index = get_node_parent_info(to_tree, to_node)

    # Replace to_node with from_node
    to_parent.remove(to_node)
    to_parent.insert(to_index, from_node)

def get_node_parent_info(tree, node):
    """
    Return tuple of (parent, index) where:
        parent = node's parent within tree
        index = index of node under parent
    """
    parent_map = {c:p for p in tree.iter() for c in p}
    parent = parent_map[node]
    return parent, list(parent).index(node)

from_tree = ET.ElementTree(file='sample1.xml')
to_tree = ET.ElementTree(file='sample2.xml')

replace_node(from_tree, to_tree, 'car')

# ET.dump(to_tree)
to_tree.write('output.xml')

UPDATE: It was recently brought to my attention that the implementation of find_child() in the solution I originally supplied would fail if the "child" in question was not in the first branch of the XML tree that was traversed. I've updated the implementation above to rectify this.