Kevin Burke Kevin Burke - 1 year ago 362
Python Question

Get all text inside a tag in lxml

I'd like to write a code snippet that would grab all of the text inside the

tag, in lxml, in all three instances below, including the code tags. I've tried
but that would miss the text in between the tags. I didn't have very much luck searching the API for a relevant function. Could you help me out?

<div>Text inside tag</div>
#should return "<div>Text inside tag</div>

Text with no tag
#should return "Text with no tag"

Text outside tag <div>Text inside tag</div>
#should return "Text outside tag <div>Text inside tag</div>"

Answer Source


def stringify_children(node):
    from lxml.etree import tostring
    from itertools import chain
    parts = ([node.text] +
            list(chain(*([c.text, tostring(c), c.tail] for c in node.getchildren()))) +
    # filter removes possible Nones in texts and tails
    return ''.join(filter(None, parts))


from lxml import etree
node = etree.fromstring("""<content>
Text outside tag <div>Text <em>inside</em> tag</div>

Produces: '\nText outside tag <div>Text <em>inside</em> tag</div>\n'