DKru DKru - 10 days ago 6x
Perl Question

Perl: Split CSV at given string and use specific string as file name

So I have several large CSV files with several columns and rows (6000 odd rows and +-60 columns each) that I would like to split into seperate CSV files at a given string (number of lines between string differs), where each file is to be named the string that appears in the first row of the first column... for example:

Peter B1 C1 D1
A2 B2 C2 D2
A3 B3 C3 D3
END B4 C4 D4
Jack B5 C5 D5
A6 B6 C6 D6
A7 B7 C7 D7
END B8 C8 D8
Billy B9 C9 D9
A10 B10 C10 D10
A11 B11 C11 D11
END B12 C12 D12

so there should be 3 files named Peter, Jack and Billy, with the word END signalling that this is the last row to be written for this file. Peter contains range A1 (contains the word Peter) to D4; Jack A5 to D8 and Billy A9 to D12.

I have this so far:

use strict;
use warnings;

my $split_woord = 'END'; #word that signals file to be split
print "Input file: ";
my $file_name = <STDIN>;

my $input_file = "file locataion/$file_name.csv";

### OPEN
open (INPUT, ">", "$input_file") or die "Can't open $file_name: $!\n";

my $name= undef;

while (<INPUT>){

my $line = $_;

my ($a,$b,$c,$d)=split('\,', $line);

until ($a eq $split_word){ #loop until column 1 reads 'END', then restart
$name eq $a; #want to indictae first line

my $output_file = "file_location/$name.csv";
open (OUTPUT, ">>", "$output_file") or die "Can't create $output_file: $!\n";

print OUTPUT "$a,$b,$c,$d\n";




I can't seem to get it to loop properly, and am also struggling to use the first column/row to act as the name for the file. Any help will be tremendously appreciated!!! TIA


First of all, your line:

open (INPUT, ">", "$input_file") 

Looks like it's opening a file for WRITING -- you wanted to read it, right?

If you're really dealing with a true CSV file, you may want to explore Text::CSV instead of splitting just on commas. It comes standard with all recent versions, and it handles the inevitable:

ID        Quote                Date
1         No, I'm fine         1/1/2016
2         Roger Winco          5/1/2016

That said, the real issue at hand...

Assuming the names don't repeat, you should be able to open an output filehandle and continue using it until it hits the terminating word:


open my $INPUT, '<', "$file_name.csv" or die;
while (<$INPUT>) {
  my ($a) = split /,/, $_, 2;

  if ($OUTPUT eq undef) {
    open $OUTPUT, '>', "$a.csv" or die;

  print $OUTPUT $_;

  if ($a eq $split_woord) {
     close $OUTPUT;
     $OUTPUT = undef;        
close $INPUT;