Niche.P Niche.P - 1 month ago 7
Python Question

Element Tree find output empty text

I have a problem using Element Tree to extract the text.

My format of my xml file is

<elecs id = 'elecs'>
<elec id = "CLM-0001" num = "0001">
<elec-text> blah blah blah </elec-text>
<elec-text> blah blah blah </elec-text>
</elec>
<elec id = "CLM-0002" num = "0002">
<elec-text> blah blah blah </elec-text>
<elec-text> blah blah blah </elec-text>
</elec>
</elecs>


I want to extract out all the text inside the tag

Assume that our xml file is in the variable xml

import xml.etree.ElementTree as ET
import lxml import etree
parser = etree.XMLParser(recover = True)
contents = open(xml).read()
tree = ET.fromstring(contents, parser = parser)
elecsN = tree.find('elecs')
for element in elecsN:
print element.text


The problem is, the code above returns empty strings. I have tried my code above with other tags in my document and it works. I do not know why it returns empty string this time.

Is there anyway i can solve this problem.

Thank you very much

Answer

If you actually mean 'any way' you could use lxml.

>>> from io import StringIO
>>> html = StringIO('''\
... <elecs id = 'elecs'>
...     <elec id = "CLM-0001" num = "0001">
...             <elec-text> blah blah blah </elec-text>
...             <elec-text> blah blah blah </elec-text>            
...     </elec>
...     <elec id = "CLM-0002" num = "0002">    
...          <elec-text> blah blah blah </elec-text>
...          <elec-text> blah blah blah </elec-text>         
...     </elec>
... </elecs>
... '''
... )
>>> from lxml import etree
>>> doc = etree.parse(html)
>>> doc.xpath('//elecs/elec/*/text()')
[' blah blah blah ', ' blah blah blah ', ' blah blah blah ', ' blah blah blah ']
Comments