Rob Rob - 1 year ago 37
Perl Question

Remove blank regex hits from an array

I am performing a regex search and wishing to print out only the hits in fasta format (two lines of data: the first beginning with a carrot ">" followed by the hit and the second line without a carrot, but still containing the hit information).

I can successfully generate an output multifasta file, but the carrot and line breaks are included in the output file whether there is a hit or not.

Generated output:





Desired output:


Here is my code:

use warnings;
use strict;

open(CLUSTER, ">", "SequencesToCluster.txt") or die $!;

my @TrimmedSequences;

my @ArrayofFiles = glob ("~/BLASTdb/Individual_Sequences_*");

foreach my $file (@ArrayofFiles){
open (my $sequence, $file) or die "can't open file: $!";
while (my $line = <$sequence>){
if ($line !~/^>/){
my $seq = $line;
$seq =~ s/\R//g;
push(@TrimmedSequences, ">", $1, "\n", $1, "\n");
#Here I believe I need to manipulate the array to get rid of blank fastas
print CLUSTER @TrimmedSequences;

Answer Source

If you're filtering an array, the tool is grep.


my @new_array = grep { not /^\s*$/ } @old_array;

This will filter any element that is just whitespace. In your case, since it's empty or just a >:

/^>?\s*$/ instead.

However, that's fixing a problem that need not exist in the first place. You could instead:

       && push(@TrimmedSequences, ">", $1, "\n", $1, "\n");  

And that will only push if the regex matches.