Nishant Kansal Nishant Kansal - 3 months ago 23
Bash Question

Shell script to fetch value of a node appearing multiple times in an XML

I have a XML as below:

<artifact>
<a>1.zip</a>
<b>2-SNAPSHOT.zip</b>
<c>3-SNAPSHOT.zip</c>
</artifact>
<artifact>
<a>4.tar</a>
<b>5.tar</b>
<c>6.tar</c>
</artifact>


My requirement is to fetch the value "5.tar" coming in the 2nd appearance of node "artifact". I am able to fetch the value if this node is present only once in the XML. However, if the same node is appearing twice or multiple times in the same XML, I am not able to fetch it.

Please help.

Answer

I will break down the answer which I tried using xmllint

$ echo "cat //root/artifact/b" |  xmllint --shell BuildResult.xml | sed '/^\/ >/d' | sed 's/<[^>]*.//g' | tr -d '\n' | awk -F"-------" '{print $2}'
5.tar

I have formatted your original BuildResult.xml file by adding <root> nodes and adding proprietary header information, to avoid any parsing errors:-

$ xmllint -format BuildResult.xml
<?xml version="1.0" standalone="yes"?>
<root>
  <artifact>
    <a>1.zip</a>
    <b>2-SNAPSHOT.zip</b>
    <c>3-SNAPSHOT.zip</c>
  </artifact>
  <artifact>
    <a>4.tar</a>
    <b>5.tar</b>
    <c>6.tar</c>
  </artifact>
</root>

The steps as they executed:-

Starting the file parsing from root-node to the repeating node (//root/artifact/b) and running xmllint in interactive shell mode (xmllint --shell)

Running the command plainly produces a result,

/ >  -------
<b>2-SNAPSHOT.zip</b>
 -------
<b>5.tar</b>
/ > 

Now removing the special characters using sed i.e. sed '/^\/ >/d' | sed 's/<[^>]*.//g' produces

2-SNAPSHOT.zip
 -------
5.tar

Now removing the newlines from the above command using tr so that awk can process the records using the field separator as -------

2-SNAPSHOT.zip -------5.tar

The awk command on the above output will produce the file as needed; awk -F"-------" '{print $2}

5.tar

Putting it together in a shell script, it looks like

#!/bin/bash

newVar=$(echo "cat //root/artifact/b" |  xmllint --shell BuildResult.xml | sed '/^\/ >/d' | sed 's/<[^>]*.//g' | tr -d '\n' | awk -F"-------" '{print $2}')
echo "$newVar"

P.S:- The number of commands can be reduced/simplified with a reduced number of awk/sed command combination. This is just a solution that works.

Comments