newB.ailey newB.ailey - 6 months ago 52
Perl Question

Removing extra commas from csv file in perl

I have a multiple CSV files each with a different amount of entries each with roughly 300 lines each.

The first line in each file is the Data labels

Person_id, person_name, person_email, person_address, person_recruitmentID, person_comments... etc


The Rest of the lines in each file contain the data

"0001", "bailey", "123 fake, street", "bailey@mail.com", "0001", "this guy doesnt know how to get rid of, commas!"... etc


I want to get rid of commas that are in between quotation marks.
I'm currently going through the Text::CSV documentation but its a slow process.

Answer Source

Let's get this out of the way: you cannot read a CSV by just splitting on commas. You've just demonstrated why; commas might be escaped or inside quotes. Those commas are totally valid, they're part of the data. Discarding them mangles the data in the CSV.

For this reason, and others, CSV files must be read using a CSV parsing library. To find which commas are data and which commas are structural also requires parsing the CSV using a CSV parsing library. So you won't be saving yourself any time by trying to remove the commas from inside quotes. Instead you'll give yourself more work while mangling the data. You'll have to use a CSV parsing library.

Text::CSV_XS is a very good, very fast CSV parsing library. It has a ton of features, most of which you do not need. Fortunately it has examples for doing most common actions.

For example, here's how you read and print each row from a file called file.csv.

use strict;
use warnings;
use autodie;
use v5.10;   # for `say`

use Text::CSV_XS;

# Open the file.
open my $fh, "<", "file.csv";

# Create a new Text::CSV_XS object.
my $csv = Text::CSV_XS->new;

# Read in the header line so it's not counted as data.
# Then you can use $csv->getline_hr() to read each row in as a hash.
$csv->header($fh);

# Read each row.
while( my $row = $csv->getline($fh) ) {
    # Do whatever you want with the list of cells in $row.
    # This prints them separated by semicolons.
    say join "; ", @$row;
}
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download