BRATVADDI BRATVADDI - 3 months ago 16
Perl Question

Right XPath expression for XML when using XML::LibXML

I have an issue in arriving at the right xpath to query data from xml. I use use

to do this

The XML



<?xml version="1.0" encoding="iso-8859-1"?>
<data>
<header>
<date>2016-08-07</date>
<name>Indices Composites</name>
<version>1.1a</version>
</header>
<row>
<CompositePrice>1.010227784212584</CompositePrice>
<CompositeSpread>0.002568273865609903</CompositeSpread>
<Date>2016-08-05</Date>
<Depth>4</Depth>
<Heat>0.0201994587386602</Heat>
<IndexID>ITRAXX-SOVXWES8V1-5Y</IndexID>
<Maturity>2017-12-20</Maturity>
<ModelPrice>1.0103988929051526</ModelPrice>
<ModelSpread>0.002445016658588964</ModelSpread>
<Name>iTraxx SovX Westn Europe</Name>
<OnTheRun>Y</OnTheRun>
<REDCode>5C769MAO9</REDCode>
<RequestKey>iTraxx SovX Westn Europe|5Y|Y</RequestKey>
<Series>8</Series>
<ShortName></ShortName>
<Term>5Y</Term>
<Version>1</Version>
</row>
<row>
<CompositePrice>1.0208723593556004</CompositePrice>
<CompositeSpread>0.006539233068666665</CompositeSpread>
<Date>2016-08-05</Date>
<Depth>3</Depth>
<Heat>0.0307106033333336</Heat>
<IndexID>ITRAXX-SOVXWES8V1-10Y</IndexID>
<Maturity>2022-12-20</Maturity>
<ModelPrice>1.0219657857189512</ModelPrice>
<ModelSpread>0.006361337372712667</ModelSpread>
<Name>iTraxx SovX Westn Europe</Name>
<OnTheRun>Y</OnTheRun>
<REDCode>5C769MAO9</REDCode>
<RequestKey>iTraxx SovX Westn Europe|10Y|Y</RequestKey>
<Series>8</Series>
<ShortName></ShortName>
<Term>10Y</Term>
<Version>1</Version>
</row>
</data>


I need to filter based on the values of certain tags. The code is like below.

my $parser = XML::LibXML->new;

my $doc = $parser->parse_file($inputFile);

my @nodes = $doc->findnodes("/data/row/Name[text()='iTraxx SovX Westn Europe']/../Term[text()='5Y']/../OnTheRun[text()='Y']");

print "@nodes \n";


The output I get is

<OnTheRun>Y</OnTheRun>


whereas I would like to get the entire node which satisfies the condition.

Is the XPath expression right here ?

Answer

XPath expressions are very like Linux file paths. If you remove all the predicates from what you have written, you get

/data/row/Name/../Term/../OnTheRun

You can see here that, from the row element, you're descending into Name and going back up one level, then into Term and going back up one level, and finally into OnTheRun, where the expression stops

This is why you see only the value of the OnTheRun element, and a simple fix would be to append another .. path step to get back up to the row element that you want to access

This XPath expression works fine

/data/row/Name[text()='iTraxx SovX Westn Europe']/../Term[text()='5Y']/../OnTheRun[text()='Y']/..

but it is very awkward to read

I think the neatest way to do this is to apply multiple predicates to the main /data/row selector, like this

/data/row[Name="iTraxx SovX Westn Europe"][Term="5Y"][OnTheRun="Y"]

Here's a full program that uses it to process you sample data

use strict;
use warnings 'all';
use open IO  => ":encoding(iso-8859-1)";

use XML::LibXML;

my $doc = XML::LibXML->load_xml( location => 'indices_composites.xml' );

my @nodes = $doc->findnodes('/data/row[Name="iTraxx SovX Westn Europe"][Term="5Y"][OnTheRun="Y"]');

printf "%d node%s found:\n\n", scalar @nodes, @nodes == 1 ? '' : 's';

print $nodes[0], "\n";

output

1 node found:

<row>
    <CompositePrice>1.010227784212584</CompositePrice>
    <CompositeSpread>0.002568273865609903</CompositeSpread>
    <Date>2016-08-05</Date>
    <Depth>4</Depth>
    <Heat>0.0201994587386602</Heat>
    <IndexID>ITRAXX-SOVXWES8V1-5Y</IndexID>
    <Maturity>2017-12-20</Maturity>
    <ModelPrice>1.0103988929051526</ModelPrice>
    <ModelSpread>0.002445016658588964</ModelSpread>
    <Name>iTraxx SovX Westn Europe</Name>
    <OnTheRun>Y</OnTheRun>
    <REDCode>5C769MAO9</REDCode>
    <RequestKey>iTraxx SovX Westn Europe|5Y|Y</RequestKey>
    <Series>8</Series>
    <ShortName/>
    <Term>5Y</Term>
    <Version>1</Version>
  </row>
Comments