BigRedEO BigRedEO - 7 months ago 12
Perl Question

Perl "scrub" characters while parsing

I'm parsing through a file - first thing I do is concatenate the first three fields and prepend them to each record. Then I want to scrub the data of any colons, single quotes, double quotes or backslashes. Following is how I'm doing it, but is there a way for me to do it using the $line variable that would be more efficient?

# Read the lines one by one.
while($line = <$FH>) {

# split the fields, concatenate the first three fields,
# and add it to the beginning of each line in the file
chomp($line);
my @fields = split(/,/, $line);
unshift @fields, join '_', @fields[0..2];

# Scrub data of characters that cause scripting problems down the line.
$_ =~ s/:/ /g for @fields[0..39];
$_ =~ s/\'/ /g for @fields[0..39];
$_ =~ s/"/ /g for @fields[0..39];
$_ =~ s/\\/ /g for @fields[0..39];

Answer

What would be cleaner for me:

while($line = <$FH>) {
    chomp($line);

    $line =~ s/[:\'"\\]/ /g;

    my @fields = split(/,/, $line);
    unshift @fields, join '_', @fields[0..2];
}

And as @HunterMcMillen said, if this is a standard CSV file it would be better to use a parsing module. It will be easier down the road.

Comments