Niche.P Niche.P - 1 month ago 5
Python Question

python XML get text inside <p>...</p> tag

I guys, I have an xml structure which looks somewhat like this.

<abstract>
<p id = "p-0001" num = "0000">
blah blah blah
</p>
</abstract>


I would like to extract the
<p>
tag inside the
<abstract>
tag only.

I tried:

import xml.etree.ElementTree as ET

xroot = ET.parse('100/A/US07640598-20100105.XML').getroot()

for row in xroot.iter('p'):
print row.text


This get all the
<p>
tag in my xml which is not a good idea.

Is there anyway i can extract the text inside

My desire output would be extracting "blah blah blah"

Answer

You can use an XPath expression to search for p elements specifically inside the abstract:

for p in xroot.xpath(".//abstract//p"):
    print(p.text.strip())

Or, if using iter() you may have a nested loop:

for abstract in xroot.iter('abstract'):
    for p in abstract.iter('p'):
        print(p.text.strip())
Comments