Deleted Deleted - 4 months ago 43
Python Question

Python: Update XML-file using ElementTree while conserving layout as much as possible

I have a document which uses an XML namespace for which I want to increase

/group/house/dogs
by one: (the file is called
houses.xml
)

<?xml version="1.0"?>
<group xmlns="http://dogs.house.local">
<house>
<id>2821</id>
<dogs>2</dogs>
</house>
</group>


My current result using the code below is: (the created file is called
houses2.xml
)

<ns0:group xmlns:ns0="http://dogs.house.local">
<ns0:house>
<ns0:id>2821</ns0:id>
<ns0:dogs>3</ns0:dogs>
</ns0:house>
</ns0:group>


I would like to fix two things (if it is possible using ElementTree. If it isn´t, I´d be greatful for a suggestion as to what I should use instead):


  1. I want to keep the
    <?xml version="1.0"?>
    line.

  2. I do not want to prefix all tags, I´d like to keep it as is.



In conclusion, I don´t want to mess with the document more than I absolutely have to.

My current code (which works except for the above mentioned flaws) generating the above result follows.

I have made a utility function which loads an XML file using ElementTree and returns the elementTree and the namespace (as I do not want to hard code the namespace, and am willing to take the risk it implies):

def elementTreeRootAndNamespace(xml_file):
from xml.etree import ElementTree
import re
element_tree = ElementTree.parse(xml_file)

# Search for a namespace on the root tag
namespace_search = re.search('^({\S+})', element_tree.getroot().tag)
# Keep the namespace empty if none exists, if a namespace exists set
# namespace to {namespacename}
namespace = ''
if namespace_search:
namespace = namespace_search.group(1)

return element_tree, namespace


This is my code to update the number of dogs and save it to the new file
houses2.xml
:

elementTree, namespace = elementTreeRootAndNamespace('houses.xml')

# Insert the namespace before each tag when when finding current number of dogs,
# as ElementTree requires the namespace to be prefixed within {...} when a
# namespace is used in the document.
dogs = elementTree.find('{ns}house/{ns}dogs'.format(ns = namespace))

# Increase the number of dogs by one
dogs.text = str(int(dogs.text) + 1)

# Write the result to the new file houses2.xml.
elementTree.write('houses2.xml')

Answer

Round-tripping, unfortunately, isn't a trivial problem. With XML, it's generally not possible to preserve the original document unless you use a special parser (like DecentXML but that's for Java).

Depending on your needs, you have the following options:

  • If you control the source and you can secure your code with unit tests, you can write your own, simple parser. This parser doesn't accept XML but only a limited subset. You can, for example, read the whole document as a string and then use Python's string operations to locate <dogs> and replace anything up to the next <. Hack? Yes.

  • You can filter the output. XML allows the string <ns0: only in one place, so you can search&replace it with < and then the same with <group xmlns:ns0="<group xmlns=". This is pretty safe unless you can have CDATA in your XML.

  • You can write your own, simple XML parser. Read the input as a string and then create Elements for each pair of <> plus their positions in the input. That allows you to take the input apart quickly but only works for small inputs.