Richard Rublev Richard Rublev - 3 months ago 16
Bash Question

Query within a specific range of line numbers from a xml file

I want to query

xml
file from lines to 1374-1601

I have tried this

$ sed -n '1374,1601p' *.xml |
xmlstarlet sel -t -v '//caldata[@chopper="on"]/c2[@unit="V/(nT*Hz)"]'


But I got

-:7.9: Extra content at the end of the document
<caldata chopper="on" gain_1="0" gain_2="0" gain_3="0" impedance="(0,0)"
^


My idea was to select lines and then pipe these lines to xmlstartlet but hat does not work.

Part of the xml file

<channel id="2">
<calibration>
<cal_version>1.0</cal_version>
<creator>software chcal 1.2</creator>
<user>metronix</user>
<calibrated_item>
<ci identifier="coil">MFS07e</ci>
<ci_serial_number>252</ci_serial_number>
<ci_revision/>
<ci_date>2013-01-24</ci_date>
<ci_time>10:57:24</ci_time>
<ci_calibration_valid_until/>
<ci_next_calibration/>
<ci_tag/>
<ci_owner/>
<ci_owners_address/>
<ci_manufacturer>metronix</ci_manufacturer>
<ci_manufacturers_address>Kocher Str. 3, 38120 Braunschweig, Germany</ci_manufacturers_address>
<ci_comments/>
</calibrated_item>
<calibration_equipment>
<ce/>
<ce_serial_number/>
<ce_revision/>
<ce_date/>
<ce_time/>
<ce_calibration_valid_until>1970-01-01</ce_calibration_valid_until>
<ce_next_calibration/>
<ce_tag/>
<ce_operator/>
<ce_location>Magnetsrode</ce_location>
<ce_contact_address>Kocher Str. 3, 38120 Braunschweig, Germany</ce_contact_address>
<ce_comments/>
</calibration_equipment>
<calibration_protocol>
<mtx>
<mtx_serial_numer_engraved/>
<mtx_preamplifier_serialnumber/>
<mtx_ch1_div_ch2_at_1000gain_0dot025_hz/>
<mtx_phase_deg_at_1000gain_0dot025_hz/>
<mtx_ch1_div_ch2_at_1000gain_0dot025_hz_calibrated/>
<mtx_phase_deg_at_1000gain_0dot025_hz_calibrated/>
<rec_freq_resp_at_0dot_0025_to_100_hz_at_gain_1000/>
<rec_freq_resp_at_0dot_0025_to_100_hz_at_gain_100/>
<rec_freq_resp_at_0dot_0025_to_100_hz_at_gain_10/>
<rec_freq_resp_at_0dot_0025_to_100_hz_at_gain_1/>
<gain_factor_preamplifier/>
<actual_sensitivity_ch2_by_ch1/>
<actual_sensitivity_theta/>
<calibrated_sensitivity_ch2_by_ch1/>
<calibrated_sensitivity_theta/>
<calibrated_cal_path_ch2_by_ch1/>
<calibrated_cal_path_theta/>
<calibrated_chopper_on/>
<calibrated_chopper_off/>
<calibrated_chopper_ukn/>
</mtx>
</calibration_protocol>
<caldata chopper="on" gain_1="0" gain_2="0" gain_3="0" impedance="(0,0)">
<c0 unit="V">0.00000000e+00</c0>
<c1 unit="Hz">4.00000000e-01</c1>
<c2 unit="V/(nT*Hz)">1.93430000e-02</c2>
<c3 unit="deg">8.92260000e+01</c3>
</caldata>


I have tried to use bichop's advice

xmlstarlet sel -t -v \
'//channel[@id="2"]/caldata[@chopper="on"]/c2[@unit="V/(nT*Hz)"]'


but it does not work.

Here is my xml file

http://pastebin.com/0BJTAMGV

Answer

The xmlstarlet command at the end of your answer has some problems that mean it will never work. The initial part of your query is...

//channel[@id="2"]/caldata[@chopper="on"]

...but according to your sample data, this will never match, because the channel element does not have any caldata elements as immediate children. The tag hierarchy is:

channel
   calibration
     cal_version
     creator
     calibrated_item
     calibrated_equipment
     calibration_protocol
     caldata
     ...

So you would need at least:

//channel[@id="2"]/calibration/caldata[@chopper="on"]

Given your sample data, if I close off all the unclosed tags, I get:

$ xmlstarlet sel -t -v \
  '//channel[@id="2"]/calibration/caldata[@chopper="on"]' data.xml 
  0.00000000e+00
  4.00000000e-01
  1.93430000e-02
  8.92260000e+01

And in fact, with that one correction your entire query seems to work:

$ xmlstarlet sel -t -v \
  '//channel[@id="2"]/calibration/caldata[@chopper="on"]/c2[@unit="V/(nT*Hz)"]' data.xml
1.93430000e-02