UsefulUserName UsefulUserName - 5 months ago 26
Perl Question

Iterating through CSV and creating an XML file



I am trying to parse a CSV file in Perl and paste the information of some columns into an XML-file. I've never done anything in Perl, and my idea was to store the data into an array and then pull the information out of the array as I build it.

I'm sure I am doing several things wrong, since I am not getting the value I am expecting but instead what looks like the array addresses in the memory (here is an example:

ARRAY(0x35e9360)
.

Could somebody help me out and point me to a better solution?

Here is the code in question:

use Text::CSV;
use utf8;
use XML::Simple qw(XMLout);
use XML::Twig;
use File::Slurp;
use Encode;

&buildXML();

my $csv = Text::CSV->new( { binary => 1 } ) # should set binary attribute.
or die "Cannot use CSV: " . Text::CSV->error_diag();

$csv = Text::CSV->new( { sep_char => '|' } );
$csv = Text::CSV_XS->new( { allow_loose_quotes => 1 } );

my $t = XML::Twig->new( pretty_print => indented );
$t->parsefile('output.xml');

$out_file = "output.xml";
open( my $fh_out, '>>', $out_file ) or die "unable to open $out_file for writing: $!";

my $root = $t->root; #get the root

open my $fh, "<:encoding(utf8)", "b.txt" or die "text.txt: $!";

while ( my $row = $csv->getline($fh) ) {

my @rows = $row;

$builds = $root->first_child(); # get the builds node
$xcr = $builds->first_child(); #get the xcr node

my $xcrCopy = $xcr->copy(); #copy the xcr node
$xcrCopy->paste( after, $xcr ); #paste the xcr node

$xcr->set_att( id => "@rows[0]" );
print {$fh_out} $t->sprint();
}

$csv->eof or $csv->error_diag();


Here is a testfile:

ID|Name|Pos
1|a|265
2|b|950
3|c|23
4|d|798
5|e|826
6|f|935
7|g|852
8|h|236
9|i|642


Here is the XML that is build by the
buildXML()
sub.

<?xml version='1.0' standalone='yes'?>
<project>
<builds>
<xcr id="" name="" pos="" />
</builds>
</project>

Answer

The getline method of Text::CSV returns an arrayref

It reads a row from the IO object $io using $io->getline () and parses this row into an array ref.

The ARRAY(0x35e9360) is indeed what you get when you print out array reference. This is usual, many parsers normally return a reference to an array for a row. So you need to dereference that, generally by @{ $arrayref }, but in this case there is no ambiguity and one can drop the curlies, @$arrayref.

use warnings;
use strict;
use Text::CSV_XS;
use XML::Twig;

my $csv = Text::CSV_XS->new (
    { binary => 1, sep_char => '|',  allow_loose_quotes => 1 }
) or die "Cannot use CSV: " . Text::CSV->error_diag();

my $t = XML::Twig->new(pretty_print => 'indented');
$t->parsefile('output.xml');
my $out_file = 'output.xml';
open my $fh_out, '>>', $out_file  or die "Can't open $out_file for writing: $!";
my $root = $t->root; # get the root

my $file = 'b.txt';
open my $fh, "<:encoding(utf8)", $file  or die "Can't open $file: $!";

while (my $rowref = $csv->getline($fh)) {
    #my @cols = @$rowref;
    #print "@cols\n";

    my $builds = $root->first_child();  # get the builds node
    my $xcr = $builds->first_child();   # get the xcr node
    my $xcrCopy = $xcr->copy();         # copy the xcr node
    $xcrCopy->paste('after', $xcr);     # paste the xcr node
    $xcr->set_att(id => $rowref->[0]);  # or $cols[0];

    print $fh_out $t->sprint();
}

This prints (when @cols and its print are uncommented) for the CSV file

ID Name Pos
1 a 265
2 b 950
...

So we've read the file OK.

The XML processing is just copied from the question, except for the part that uses the CSV value. We take the first element of the current row, which is $rowref->[0] since $rowref is a reference. (Or use an element from the dereferenced array, $cols[0].)

I don't know how the output XML should look but it is built out of the shown template.


Note. A single element of an array is a scalar, thus it bears a $ -- so, $cols[0]. If you were to extract multiple columns you could use an array slice, in which case the result is an array so it needs the @, for example @cols[0,2] is an array with the first and third element. This can then be assigned to a list, for example my ($c1, $c3) = @cols[0,2];.