Praveer N Praveer N - 1 year ago 43
Python Question

Replacing multiple strings with regex in python for a file giving truncated string

The following python code

import xml.etree.cElementTree as ET
import time
import fileinput
import re

ts = str(int(time.time()))
modifiedline =''
for line in fileinput.input("singleoutbound.xml"):
line = re.sub('OrderName=".*"','OrderName="'+ts+'"', line)
line = re.sub('OrderNo=".*"','OrderNo="'+ts+'"', line)

line = re.sub('ShipmentNo=".*"','ShipmentNo="'+ts+'"', line)

line = re.sub('TrackingNo=".*"','TrackingNo="'+ts+'"', line)

line = re.sub('WaveKey=".*"','WaveKey="'+ts+'"', line)

Returns the modifiedline string with some lines truncated wherever the first match is found

How do I ensure it returns the complete string for each line?


I have changed the way I am solving this problem, inspired by Tomalak's answer

import xml.etree.cElementTree as ET
import time

ts = str(int(time.time()))

doc = ET.parse('singleoutbound.xml')

for elem in doc.iterfind('//*'):
if 'OrderName' in elem.attrib:
elem.attrib['OrderName'] = ts
if 'OrderNo' in elem.attrib:
elem.attrib['OrderNo'] = ts
if 'ShipmentNo' in elem.attrib:
elem.attrib['ShipmentNo'] = ts
if 'TrackingNo' in elem.attrib:
elem.attrib['TrackingNo'] = ts
if 'WaveKey' in elem.attrib:
elem.attrib['WaveKey'] = ts


Answer Source

Here is how to use ElementTree to make modifications to an XML file without accidentally breaking it:

import xml.etree.cElementTree as ET
import time

ts = str(int(time.time()))

doc = ET.parse('singleoutbound.xml')

for elem in doc.iterfind('//*[@OrderName]'):
    elem.attrib['OrderName'] = ts

# and so on


Things to understand:

  • XML represents a tree-shaped data structure that consists of elements, attributes and values, among other things. Treating it as line-based plain text fails to recognize this fact.
  • There is a language to select items from that tree of data, called XPath. It's powerful and not difficult to learn. Learn it. I've used //*[@OrderName] above to find all elements that have an OrderName attribute.
  • Trying to modify the document tree with improper tools like string replace and regular expressions will lead to more complex and hard-to-maintain code. You will encounter run-time errors for completely valid input that your regex has no special case for, character encoding issues and silent errors that are only caught when someone looks at your program's output. In other words: It's the wrong thing to do, so don't do it.
  • The above code is actually simpler and much easier to reason about and extend than your code.