user3781528 user3781528 - 5 months ago 11
Perl Question

remove lines from text file that contain specific text

I'm trying to remove lines that contain 0/0 or ./. in column 71 "FORMAT.1.GT" from a tab delimited text file.

I've tried the following code but it doesn't work. What is the correct way of accomplishing this? Thank you

my $cmd6 = `fgrep -v "0/0" | fgrep -v "./." $Variantlinestsvfile > $MDLtsvfile`; print "$cmd6";

Answer

You can either call a one-liner as borodin and zdim said. Which one is right for you is still not clear because you don't tell whether 71st column means the 71st tab-separated field of a line or the 71st character of that line. Consider

12345\t6789

Now what is the 2nd column? Is it the character 2 or the field 6789? Borodin's answer assumes it's 6789 while zdim assumes it's 2. Both showed a solution for either case but these solutions are stand-alone solutions. Programs of its own to be run from the commandline.

If you want to integrate that into your Perl script you could do it like this:

Replace this line:

my $cmd6 = `fgrep -v "0/0" | fgrep -v "./." $Variantlinestsvfile > $MDLtsvfile`; print "$cmd6"; 

with this snippet:

open( my $fh_in, '<', $Variantlinestsvfile ) or die "cannot open $Variantlinestsvfile: $!\n";
open( my $fh_out, '>', $MDLtsvfile ) or die "cannot open $MDLtsvfile: $!\n";
while( my $line = <$fh_in> ) {

    # character-based:
    print $fh_out $line unless (substr($line, 70, 3) =~ m{(?:0/0|\./\.)});

    # tab/field-based:
    my @fields = split(/\s+/, $line);
    print $fh_out $line unless ($fields[70] =~ m|([0.])/\1|);
}
close($fh_in);
close($fh_out);

Use either the character-based line or the tab/field-based lines. Not both!

Borodin and zdim condensed this snippet to a one-liner, but you must not call that from a Perl script.

Comments