Richard Rublev Richard Rublev - 5 months ago 6
Bash Question

How to grep my xml file and save output?

I am just giving part of huge xml file

<caldata chopper="on" gain_1="0" gain_2="0" gain_3="0" impedance="(0,0)">
<c0 unit="V">0.00000000e+00</c0>
<c1 unit="Hz">4.00000000e+04</c1>
<c2 unit="V/(nT*Hz)">8.35950000e-06</c2>
<c3 unit="deg">-1.17930000e+02</c3>
</caldata>
<caldata chopper="on" gain_1="0" gain_2="0" gain_3="0" impedance="(0,0)">
<c0 unit="V">0.00000000e+00</c0>
<c1 unit="Hz">5.55810000e+04</c1>
<c2 unit="V/(nT*Hz)">4.43400000e-06</c2>
<c3 unit="deg">-1.58280000e+02</c3>
</caldata>
<caldata chopper="on" gain_1="0" gain_2="0" gain_3="0" impedance="(0,0)">
<c0 unit="V">0.00000000e+00</c0>
<c1 unit="Hz">6.00000000e+04</c1>
<c2 unit="V/(nT*Hz)">3.63180000e-06</c2>
<c3 unit="deg">-1.67340000e+02</c3>
</caldata>
<caldata chopper="off" gain_1="0" gain_2="0" gain_3="0" impedance="(0,0)">
<c0 unit="V">0.00000000e+00</c0>
<c1 unit="Hz">4.00000000e-01</c1>
<c2 unit="V/(nT*Hz)">1.07140000e-02</c2>
<c3 unit="deg">1.48080000e+02</c3>
</caldata>
<caldata chopper="off" gain_1="0" gain_2="0" gain_3="0" impedance="(0,0)">
<c0 unit="V">0.00000000e+00</c0>
<c1 unit="Hz">5.55800000e-01</c1>
<c2 unit="V/(nT*Hz)">1.33250000e-02</c2>
<c3 unit="deg">1.39110000e+02</c3>
</caldata>
<caldata chopper="off" gain_1="0" gain_2="0" gain_3="0" impedance="(0,0)">
<c0 unit="V">0.00000000e+00</c0>
<c1 unit="Hz">7.72300000e-01</c1>
<c2 unit="V/(nT*Hz)">1.57750000e-02</c2>
<c3 unit="deg">1.29560000e+02</c3>


I have tried like this

grep '<c1 unit="Hz"' *.xml | cut -f2 -d">"|cut -f1 -d"<"


Works fine bit what I really want is output only when
caldata chopper="off"

and to save my output to file.
How to do this?

Answer

A solution would be to use an XML grep, such as xgrep. I tried it myself on my machine and got this:

$ xgrep -t -x '//caldata[@chopper="off"]/c1[@unit="Hz"]/text()' test.xml 
4.00000000e-01
5.55800000e-01
7.72300000e-01

The secret is the XPath expression:

  • //caldata[@chopper="off"] - take all caldata element with chopper attribute equals to off;
  • c1[@unit="Hz"] - from that caldata elements, get c1 elements with unit attribute equals to Hz;
  • text() - from those c1 elements, get only the text content.

I don't know if you could use a custom tool like this, sure, but if you can, it can be your best solution.

Comments