Aditya J. Aditya J. - 2 months ago 4x
Perl Question

Perl: Assigning a variable one of 3 possible values

I have a DNA sequence. Let's call it "ATCG". I have 2 small databases of DNA sequences in 2 separate files, which we will call "db1.txt" and "db2.txt". Both databases are formatted as follows:

>name of sequence
>name of another sequence

I want to know if my DNA sequence is contained in one of the databases, and if so which one. My result, then, has 3 possible values: my sequence is in neither database, in db1, or in db2. Here's my code:

use warnings;
use strict;
my $entry = 'ATCG';
my $returnval = "The sequence is from neither database";

#if in db1
my $name1;
my $seq1;
open (my $database1, "<", "db1.txt") or die "Can't find db1";
while (<$database1>){
chomp ($name1 = <$database1>);
chomp ($seq1 = <$database1>);
if (
index($seq1, $entry) != -1
|| index($entry, $seq1) != -1
) {
$returnval = "The sequence is from db1: ". $name1;

#If in db2:
my $name2;
my $seq2;
open (my $database2, "<", "db2.txt") or die "Can't find db2";
while (<$database2>){
chomp ($name2 = <$database2>);
chomp ($seq2 = <$database2>);
index($seq2, $entry) != -1
|| index($entry, $seq2) != -1
) {
$returnval = "The sequence is from db2: ". $name2;

print $returnval . "\n";

There are a few problems with this code (probably more than a few). No matter what my sequence, $returnval = "The sequence is from db2: " with no name at the end. Furthermore, it seems that $name2 and $seq2 are uninitialized values, even though the code is identical to that for db1. If I remove the entire section for testing for db2, the code only returns "the sequence is from db1: " followed by the appropriate name for some sequences I copied and pasted from the database, while it returns "the sequence is from neither database" for others.

What am I doing wrong? How do I fix the uninitialized values, and why is the code for db2 not working?

I forgot to mention that outputting that the sequence is in db2 takes precedence over outputting that it is in db1, should a sequence be in both.


The main issue is in the conditions of the while loops, which read and discard a line each iteration and prevent the $name and $seq variables from containing a name and sequence each time. Removing that condition and placing the check for end-of-file inside the loop should fix the problem. It's also possible to loop over the two databases and apply the same logic to both, so you'll only need one loop to examine the contents of each file.

use warnings;
use strict;
my $entry = 'ATCG';
my $returnval = "The sequence is from neither database";
my @files = qw(db2 db1);

for my $file (@files) {
    open my $fh, '<', "$file.txt" or die "Error opening $file: $!";
    while (1) {
        my $name = <$fh>;
        my $seq  = <$fh>;
        if (not defined $seq) {
            warn "Odd number of lines in $file" if defined $name;
            last; # Reached end of file
        chomp($name, $seq);
        if (
            index($seq, $entry) != -1
            or index($entry, $seq) != -1
        ) {
            $returnval = "The sequence is from $file: $name";
            last FILE; # No need to search the others

print "$returnval\n";