Deleted Deleted - 1 year ago 94
Python Question

Python: Update XML-file using ElementTree while conserving layout as much as possible

I have a document which uses an XML namespace for which I want to increase

by one: (the file is called

<?xml version="1.0"?>
<group xmlns="">

My current result using the code below is: (the created file is called

<ns0:group xmlns:ns0="">

I would like to fix two things (if it is possible using ElementTree. If it isn´t, I´d be greatful for a suggestion as to what I should use instead):

  1. I want to keep the
    <?xml version="1.0"?>

  2. I do not want to prefix all tags, I´d like to keep it as is.

In conclusion, I don´t want to mess with the document more than I absolutely have to.

My current code (which works except for the above mentioned flaws) generating the above result follows.

I have made a utility function which loads an XML file using ElementTree and returns the elementTree and the namespace (as I do not want to hard code the namespace, and am willing to take the risk it implies):

def elementTreeRootAndNamespace(xml_file):
from xml.etree import ElementTree
import re
element_tree = ElementTree.parse(xml_file)

# Search for a namespace on the root tag
namespace_search ='^({\S+})', element_tree.getroot().tag)
# Keep the namespace empty if none exists, if a namespace exists set
# namespace to {namespacename}
namespace = ''
if namespace_search:
namespace =

return element_tree, namespace

This is my code to update the number of dogs and save it to the new file

elementTree, namespace = elementTreeRootAndNamespace('houses.xml')

# Insert the namespace before each tag when when finding current number of dogs,
# as ElementTree requires the namespace to be prefixed within {...} when a
# namespace is used in the document.
dogs = elementTree.find('{ns}house/{ns}dogs'.format(ns = namespace))

# Increase the number of dogs by one
dogs.text = str(int(dogs.text) + 1)

# Write the result to the new file houses2.xml.

Answer Source

Round-tripping, unfortunately, isn't a trivial problem. With XML, it's generally not possible to preserve the original document unless you use a special parser (like DecentXML but that's for Java).

Depending on your needs, you have the following options:

  • If you control the source and you can secure your code with unit tests, you can write your own, simple parser. This parser doesn't accept XML but only a limited subset. You can, for example, read the whole document as a string and then use Python's string operations to locate <dogs> and replace anything up to the next <. Hack? Yes.

  • You can filter the output. XML allows the string <ns0: only in one place, so you can search&replace it with < and then the same with <group xmlns:ns0="<group xmlns=". This is pretty safe unless you can have CDATA in your XML.

  • You can write your own, simple XML parser. Read the input as a string and then create Elements for each pair of <> plus their positions in the input. That allows you to take the input apart quickly but only works for small inputs.