Manuel De Oliveira Manuel De Oliveira - 2 months ago 9x
Perl Question

Parsing in a long XML file with perl

Hi everyone im a new perl programmer and im now trying to fetch some data from a long XML file. But i cant to get the two data at the same time in general my code, please i need to check how to use efficiently a loop or any structure to get the data that i need.

<datetime>7/28/2016 12:00:00 AM - 12:00:15 AM</datetime>
<value channel="Traffic Total (volume)" channelid="1">4,664,204 KByte</value>
<value_raw channel="Traffic Total (volume)" channelid="1">4776145337.3504</value_raw>
<value channel="Traffic Total (speed)" channelid="1">517,319 kbit/s</value>
<value_raw channel="Traffic Total (speed)" channelid="1">64664843.4518</value_raw>
<value channel="Traffic DL (volume)" channelid="2">3,805,763 KByte</value>
<value_raw channel="Traffic DL (volume)" channelid="2">3897101197.8596</value_raw>
<value channel="Traffic DL (speed)" channelid="2">422,107 kbit/s</value>
<value_raw channel="Traffic DL (speed)" channelid="2">52763352.2591</value_raw>
<value channel="Traffic UL (volume)" channelid="3">858,442 KByte</value>
<value_raw channel="Traffic UL (volume)" channelid="3">879044139.4907</value_raw>
<value channel="Traffic UL (speed)" channelid="3">95,212 kbit/s</value>
<value_raw channel="Traffic UL (speed)" channelid="3">11901491.1927</value_raw>
<coverage>100 %</coverage>

I have hundrends of items like these and i need to extrac the pair datatime and the specific value channel="Traffic Total (volume)" at the same time, here an extract of my perl code:

my $reader = XML::LibXML::Reader->new(string => "$HDF") or die "cannot read file.xml\n";

while ($reader->nextElement( 'item' )) {
my $item = $reader->readInnerXml;
while ($reader->nextElement( 'datetime' )) {
$DT = $reader->readInnerXml;
print $DT;

while ($reader->nextElement( 'value' )) {
my $value = $reader->readInnerXml;
if ($value eq 'Traffic Total (speed)'){
$HD = $reader->readInnerXml;
print $HD;

Thanks for your comments about it.


For long XML, I find XML::Twig has really got it good - it can use twig_handers and purge as you're parsing, so you can handle subsets of XML efficiently.

So assuming you're wanting to go by "item":

#!/usr/bin/env perl

use strict;
use warnings;

use XML::Twig;

my @things = ( './datetime', './value[@channel="Traffic Total (speed)"]' );

sub process_item {
   my ( $twig, $item ) = @_;      
   print join "\t", (map { $item -> get_xpath($_,0) -> text } @things),"\n";

my $twig = XML::Twig -> new ( twig_handlers => { 'item' => \&process_item } );
   $twig -> parsefile ('your_file.xml'); 

What purge does is discard 'up to this point' from memory, making it quite efficient for XML containing a large number of similar elements.