nautical nautical - 1 month ago 12
Bash Question

XPath expression to get node based on attribute value

I have the following input xml file:

<rootnode>
<section id="1" status="fail">
<outer status="fail">
<inner status="fail"/>
<inner status="pass"/>
</outer>
<outer status="pass">
<inner status="pass"/>
</outer>
<outer status="pass"/>
<outer status="fail"/>
</section>
<section id="2" status="fail">
<outer status="fail">
<inner status="pass"/>
<inner status="fail"/>
<inner status="inc"/>
</outer>
</section>
</rootnode>


I want to filter out all non-fail status nodes so that the result looks like this:

<rootnode>
<section id="1" status="fail">
<outer status="fail">
<inner status="fail"/>
</outer>
<outer status="fail"/>
</section>
<section id="2" status="fail">
<outer status="fail">
<inner status="fail"/>
</outer>
</section>
</rootnode>


The
<rootnode>
must not necessarily be included in the result. I have tried to use
xmllint
with an xpath expression. I can extract specific nodes with

xmllint --xpath "//inner" input.xml
xmllint --xpath "//@status" input.xml


but they only either return the nodes without regard to the value of
status
or the only return the attribute without the surrounding nodes.

Is there a way to do this with an xpath expression? If not, a simple solution which incorporates other bash tools is fine, too.

Answer

Like @svasa said in a comment, you should use XSLT. You can easily process the XSLT in bash with xsltproc, xmlstarlet (using tr command), Saxon (java on the command line), etc.

Here's an example using xsltproc:

$ xsltproc so.xsl so.xml
<?xml version="1.0"?>
<rootnode>
  <section id="1" status="fail">
    <outer status="fail">
      <inner status="fail"/>
    </outer>
    <outer status="fail"/>
  </section>
  <section id="2" status="fail">
    <outer status="fail">
      <inner status="fail"/>
    </outer>
  </section>
</rootnode>

XML Input (so.xml)

<rootnode>
    <section id="1" status="fail">
        <outer status="fail">
            <inner status="fail"/>
            <inner status="pass"/>
        </outer>
        <outer status="pass">
            <inner status="pass"/>
        </outer>
        <outer status="pass"/>
        <outer status="fail"/>
    </section>
    <section id="2" status="fail">
        <outer status="fail">
            <inner status="pass"/>
            <inner status="fail"/>
            <inner status="inc"/>
        </outer>
    </section>
</rootnode>

XSLT 1.0 (so.xsl)

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output indent="yes"/>
  <xsl:strip-space elements="*"/>

  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="*[@status[not(normalize-space()='fail')]]"/>

</xsl:stylesheet>
Comments