Mechanic Mechanic - 5 months ago 11
Python Question

Parsing xml with same tags and different attributes

I have a network file created from osm using Netconvert. The root element is edge with different attributes. For example, in the first part of the file, the edges are organized as follows.

<edge id=":367367171_1" function="internal">
<lane id=":367367171_1_0" index="0" disallow="tram rail_urban rail rail_electric ship" speed="5.56" length="15.86" shape="7413.68,8096.43 7409.39,8098.94 7406.50,8098.93 7405.03,8096.39 7404.96,8091.32"/>
</edge>
<edge id=":367367171_2" function="internal">
<lane id=":367367171_2_0" index="0" disallow="tram rail_urban rail rail_electric ship" speed="5.56" length="9.40" shape="7413.68,8096.43 7412.34,8099.01 7410.83,8099.98 7409.14,8099.36 7407.28,8097.13"/>
</edge>
<edge id=":367367171_3" function="internal">
<lane id=":367367171_3_0" index="0" disallow="tram rail_urban rail rail_electric ship" speed="5.56" length="5.56" shape="7408.25,8091.65 7407.28,8097.13"/>
</edge>
<edge id=":367367171_4" function="internal">
<lane id=":367367171_4_0" index="0" disallow="tram rail_urban rail rail_electric ship" speed="5.56" length="5.69" shape="7408.25,8091.65 7408.69,8097.32"/>
</edge>


In the second part, the attributes of the edge file changes and it looks like below

<edge id="102323265#13" from="1181188708" to="1181188720" priority="1" type="highway.cycleway">
<lane id="102323265#13_0" index="0" allow="bicycle" speed="5.56" length="1.96" width="1.00" shape="14310.67,8986.24 14309.63,8984.59"/>
</edge>
<edge id="102323265#2" from="2577245263" to="1721713370" priority="1" type="highway.cycleway" shape="14903.54,9214.01 14891.64,9210.58 14796.11,9178.46 14789.16,9175.24">
<lane id="102323265#2_0" index="0" allow="bicycle" speed="5.56" length="113.82" width="1.00" shape="14898.81,9213.21 14891.49,9211.10 14795.93,9178.98 14791.04,9176.72"/>
</edge>
<edge id="102323265#3" from="1721713370" to="1193980046" priority="1" type="highway.cycleway" shape="14789.16,9175.24 14783.34,9171.87 14779.91,9168.83 14776.75,9165.32">
<lane id="102323265#3_0" index="0" allow="bicycle" speed="5.56" length="9.86" width="1.00" shape="14786.63,9174.41 14783.01,9172.31 14779.55,9169.24 14778.85,9168.47"/>
</edge>
<edge id="102323265#4" from="1193980046" to="1193980047" priority="1" type="highway.cycleway" shape="14776.75,9165.32 14764.89,9151.27 14762.54,9144.61">
<lane id="102323265#4_0" index="0" allow="bicycle" speed="5.56" length="20.05" width="1.00" shape="14774.71,9163.77 14764.40,9151.55 14763.05,9147.72"/>
</edge>
<edge id="102323265#5" from="1193980047" to="1193980057" priority="1" type="highway.cycleway" shape="14762.54,9144.61 14760.31,9140.42 14753.93,9131.92 14749.20,9127.42 14743.90,9123.46 14738.81,9120.77 14731.67,9118.17 14707.61,9110.82">
<lane id="102323265#5_0" index="0" allow="bicycle" speed="5.56" length="60.21" width="1.00" shape="14760.51,9141.98 14759.82,9140.67 14753.49,9132.25 14748.82,9127.82 14743.57,9123.90 14738.55,9121.26 14731.49,9118.68 14710.43,9112.25"/>
</edge>


As you can see, there are different attributes for the element edge. When I try to access the elements using the following code,

for elem in netFile.iter(tag='edge'):
print(elem.attrib['from'])


I get a
KeyError:'from'


When I change the key to
'function'
instead of
'from'
, the code prints me multiple lines of
'internal'
and when it approaches the end of the first part, it again throw me

KeyError: 'function'
.

I understand that I have to selectively iterate through the edges in which the attribute
'from'
is present, but have no idea on how to proceed. Can someone help?

Thanks

Answer

You could detect wich part of the file you are proccessing by the presence of the attributes e.g.:

# The !required! attributes for each part
part1_attributes = ["id", "function"]
part2_attributes = ["id", "from", "to", "priority", "type"]

for elem in netFile.iter(tag='edge'):
    if all([attr in elem.attrib for attr in part1_attributes]):
        # part 1
        print("function: " + elem.attrib["function"])
    elif all([attr in elem.attrib for attr in part2_attributes]):
        # part 2
        print("from: " + elem.attrib["from"])
    else:
        print("Unknown part found while parsing xml")
        # or raise Exception("message...") or exit program etc.

If one of the edges suddendly does not contain one of the attributes this will sort it out and return an error (or just print it and continue), instead of returning None like in gr1zzly be4r's answer.

Comments