Chris Chris - 4 months ago 9
Perl Question

extract specific fields and combine into 1 text file with perl

I am trying to use

perl
to extract specific fields from all text files in a directory output that to one new file, with each text file on a new line.

input

#Sample = xxxxx
#Sample Type = xxxxxx
#Build = xxxxxxx
#Platform = xxxxxxx
#Display Name= XXXXX (keep this field without the #)
#identifier = xxxxxx (keep this field without the #)
#Gender = xxxxx (keep this field without the #)
#Control Gender = xxxxx
#Control Sample = xxxxx
#Quality = X.XXXXXX (keep this field without the # and X.XXX)


desired output (fields to keep from each text file)

Display Name= XXXXX (keep this field without the #)
identifier = xxxxxx (keep this field without the #)
Gender = xxxxx (keep this field without the #)


I took @Borodin suggestion in an earlier post and tried a script to accomplish this that I think is close:

perl

#!/bin/perl
use strict; use warnings;
perl -ne '(s/^#(Display Name|identifier|Gender)/$1/ or s/^#(Quality = \d\.\d{3})\d+/$1/) and print' *.txt > all.txt
perl "C:\cygwin\home\get_all_qc2.pl"
syntax error at C:\cygwin\home\get_all_qc2.pl line 3, near "-ne"
Execution of C:\cygwin\home\get_all_qc2.pl aborted due to compilation errors.


Thank you :).

Answer

OK, to start with, if you're running this code as a script from inside a .pl file, you're doing it wrong. What you've done is written the shell invocation of a Perl one-liner into your file and expected it to execute as Perl code!

So, to start, we change your file to something like this:

#!/bin/perl
use strict; use warnings;
s/^#(Display Name|identifier|Gender)/$1/ or s/^#(Quality = \d\.\d{3})\d+/$1/) and print;

And then we just invoke it with perl file.pl.

But that doesn't actually do what you want.

So, instead, we do something like this:

#!/bin/perl
use warnings; use strict; # Good Perl practice to use these, always

my $file = $ARGV[0]; # Grabs the filename from the cmdline arguments
open my $fh, '<', $file or die "Cannot open $file: $!"; # Opens the file

while (my $line = <$fh>) {
    $line =~ /\#(?:((?:Display Name|Identifier|Gender) = .+)|(Quality =))/; # Match and capture your desired elements
    print $1 if ($1); # If we found anything, print it
}

close $fh;

Then we execute it with perl file.pl input.txt, sit back, and let it run.

Comments