chell chell - 2 months ago 8
Ruby Question

How to get all the text excluding text with specific tags with Nokogiri?

I have the following XML:

<w:body>
<w:p w14:paraId="15812FB6" w14:textId="27A946A1" w:rsidR="001665B3" w:rsidRDefault="00771852">
<w:r>
<w:t xml:space="preserve">I am writing this </w:t>
</w:r>
<w:ins w:author="Mitchell Gould" w:date="2016-10-04T17:24:00Z" w:id="0">
<w:r w:rsidR="00A1573E">
<w:t>text to look</w:t>
</w:r>
</w:ins>
<w:del w:author="Mitchell Gould" w:date="2016-10-04T17:24:00Z" w:id="1">
<w:r w:rsidDel="00A1573E">
<w:delText>to test</w:delText>
</w:r>
</w:del>
...


I know that I get get all of the text using:

only_text_array = @file.search('//text()')


however, I actually want two text sets:


  • One that contains all of the text except the text from the
    <w:del>...</w:del>
    elements.

  • Another that contains all of the text except the text from the
    <w:ins>...</w:ins>
    elements.



How can I accomplish this?

Answer

You can try using the following XPath :

//text()[not(ancestor::w:del or ancestor::w:ins)]

xpatheval demo

This XPath returns all text nodes where none of the ancestor is w:del or w:ins