Rob Rob - 3 months ago 7
Perl Question

Assigning regex search to variable: Uninitialized variable error

I am opening files in a directory that contain two lines of sequences in each file. The top sequence is longer than the bottom, but includes the bottom sequence. I would like to extend the bottom sequence by the two flanking letters in each direction once it is found in the top sequence. I am trying this by a doing a regex match, but am getting a uninitialized error for the $newsequence variable.
Here is what a typical file looks like:

>CCCCNNNNNCCCC
NNNNN


I would like to print to one file all the sequences in the following format:

>CCCCNNNNNCCCC
CCNNNNNCC


Here is my code so far:

use strict;
use warnings;

my ($directory) = @ARGV
my @array = glob "$directory/*";
my $header;
my $sequence;
my $newsequence;

open(OUT, ">", "/path/to/out.txt") or die $!;
foreach my $file (@array){
open (my $fh, $file) or die $!;
while (my $line = <$fh>){
chomp $line;
if ($line =~ /^>/) {
$header = $line;
} elsif ($line =~ /^[CN]/) {
$sequence = $line;
}
my ($newsequence) = $header =~ /(([CN]{2})($sequence)([CN]{2}))/;
}
print OUT $header, "\n", $newsequence, "\n";
}


How can I improve my regex assignment to $newsequence to get adequate output? Thanks.

Answer

This line is wrong:

my ($newsequence) = $header =~ /(([CN]{2})($sequence)([CN]{2}))/; 

The my keyword is creating a new variable $newsequence local to the while loop, not assigning the variable in the main script. So when you try to write $newsequence after the loop is done, the variable is still uninitialized.

Either put the print statement inside the while loop, or remove the my keyword in this assignment.

Also, you should put that assignment statement inside the elseif block. Otherwise, you'll try to use $sequence before you've assigned it. So the whole thing should look like:

foreach my $file (@array){ 
    open (my $fh, $file) or die $!; 
    while (my $line = <$fh>){ 
        chomp $line; 
        if ($line =~ /^>/) { 
            $header = $line; 
        } elsif ($line =~ /^[CN]/) { 
            $sequence = $line; 
            ($newsequence) = $header =~ /(([CN]{2})($sequence)([CN]{2}))/; 
            print OUT $header, "\n", $newsequence, "\n"; 
        } 
    } 

}