Aditya J. Aditya J. - 3 months ago 9
Perl Question

Perl: String in Substring or Substring in String

I'm working with DNA sequences in a file, and this file is formatted something like this, though with more than one sequence:

>name of sequence
EXAMPLESEQUENCEATCGATCGATCG


I need to be able to tell if a variable (which is also a sequence) matches any of the sequences in the file, and what the name of the sequence it matches, if any, is. Because of the nature of these sequences, my entire variable could be contained in a line of the file, or a line of the variable could be a part of my variable.
Right now my code looks something like this:

use warnings;
use strict;
my $filename = "/users/me/file/path/file.txt";
my $exampleentry = "ATCG";
my $returnval = "The sequence does not match any in the file";
open file, "<$filename" or die "Can't find file";
my @Name;
my @Sequence;
my $inx = 0;
while (<file>){
$Name[$inx] = <file>;
$Sequence[$inx] = <file>;
$indx++;
}unless(index($Sequence[$inx], $exampleentry) != -1 || index($exampleentry, $Sequence[$inx]) != -1){
$returnval = "The sequence matches: ". $Name[$inx];
}
print $returnval;


However, even when I purposely set $entry as a match from the file, I still return
The sequence does not match any in the file
. Also, when running the code, I get
Use of uninitialized value in index at thiscode.pl line 14, <file> line 3002.
as well as
Use of uninitialized value within @Name in concatenation (.) or string at thiscode.pl line 15, <file> line 3002.


How can I perform this search?

Answer

If I understand you correctly, I hope this could solve your problem:

use feature qw(say);
use strict;
use warnings;

my $filename = "file.txt";
my $exampleentry = "ATCG";
my $returnval = "The sequence does not match any in the file";
open (my $fh, '<', $filename ) or die "Can't find file: $!";
my @name;
my @sequence;
my $inx = 0;
while (<$fh>) {
    chomp ($name[$inx] = <$fh>);
    chomp ($sequence[$inx] = <$fh>);
    if (
        index($sequence[$inx], $exampleentry) != -1
        || index($exampleentry, $sequence[$inx]) != -1
    ) {
        $returnval = "The sequence matches: ". $name[$inx];
        last;
    }
}
say $returnval;

If this is not what you are looking for, please clarify your question.

Notes:

  • I have changed variable names to follow snake_case convention. For example the variable @Name is better written using all lower case as @name.

  • I changed the open() call to follow the new recommended 3-parameter style, see Don't Open Files in the old way for more information.

  • Used feature say instead of print

  • Added a chomp after each readline to avoid storing newline characters in the arrays.