Goof Goof - 1 month ago 11
Perl Question

Line by line parsing of Perl Mechanize content

Creating perl scritps to auto download CSV's from various billers websites but I'm having problems taking the data from $mech->content() into something I can parse line by line for some reason. The content is a multi line CSV file,

#!/usr/bin/perl
use WWW::Mechanize;
use IO::Socket::SSL qw();

my $mech = WWW::Mechanize->new();
...stuff...
my $data=$mech->content();
my (@lines)=split(/\n?\r/,$data);
print "lines=".@lines."\n---\n@lines\n---\n";
write_file("tmp.csv",$data);

for(my $i=0;$i<@lines;$i++){
...work that's done that depends on each
line being represented as an element of
an array...
}


Originally I assigned $mech->content() directly to @lines, tried a few other things like $mech->content( raw => 1 ), as you see above I tried splitting it with \n or \r.
Browser shows the csv file as text/plain, Quirks mode, UTF-8
Running file tmp.csv shows it's ASCII text and is multiline.

What am I doing wrong, and what's that right way to do this?

Answer

The problem is here:

my (@lines)=split(/\n?\r/,$data);

You have the newline regex backwards. It's \r?\n, but it's safer to write \015?\012 for the literal characters because \r and \n can be different on some systems.

Your for loop can be better written as:

for my $line (@lines) {

However, you generally don't want to process entire files as an array. What you're doing can use a tremendous amount of memory. Instead it's better to first save it to disk and read the CSV file line by line.

use autodie;

$mech->get( $uri, ':content_file' => "test.csv" );

open my $fh, "test.csv";
while( my $line = <$fh> ) {
    ...
}

But don't do your own CSV parsing. It's much faster and less buggy to use Text::CSV_XS.