Ashley Ashley - 1 month ago 24
Perl Question

Counting number of pattern matches in Perl

I am VERY new to perl, and to programming in general.
I have been searching for the past couple of days on how to count the number of pattern matches; I have had a hard time understanding others solutions and applying them to the code I have already written.

Basically, I have a sequence and I need to find all the patterns that match [TC]C[CT]GGAAGC

I believe I have that part down. but I am stuck on counting the number of occurrences of each pattern match. Does anyone know how to edit the code I already have to do this? Any advice is welcomed. Thanks!

#!/usr/bin/perl
use strict;
use warnings;
use diagnostics;

# open fasta file for reading
unless( open( FASTA, "<", '/scratch/Drosophila/dmel-all-chromosome- r6.02.fasta' )) {
die "Can't open dmel-all-chromosome-r6.02.fasta for reading:", $!;
}

#split the fasta record
local $/ = ">";

#scan through fasta file
while (<FASTA>) {
chomp;
if ( $_ =~ /^(.*?)$(.*)$/ms) {
my $header = $1;
my $seq = $2;
$seq =~ s/\R//g; # \R removes line breaks
while ( $seq =~ /([TC]C[CT]GGAAGC)/g) {
print $1, "\n";
}
}
}


Update, I have added in

my @matches = $seq =~ /([TC]C[CT]GGAAGC)/g;
print scalar @matches;


In the code below. However, it seems to be outputting 0 in front of each pattern match, instead of outputting the total sum of all pattern matches.

while (<FASTA>) {
chomp;
if ( $_ =~ /^(.*?)$(.*)$/ms) {
my $header = $1;
my $seq = $2;
$seq =~ s/\R//g; # \R removes line breaks
while ( $seq =~ /([TC]C[CT]GGAAGC)/g) {
print $1, "\n";
my @matches = $seq =~ /([TC]C[CT]GGAAGC)/g;
print scalar @matches;
}
}
}


Edit: I need the output to list ever pattern match found. I also need it to find the total number of matches found. For example:

CCTGGAAGC

TCTGGAAGC

TCCGGAAGC

3 matches found

Answer

As you have written the code, you have to count the matches yourself:

local $/ = ">";
my $count = 0;

#scan through fasta file 
while (<FASTA>) {
    chomp;
    if ( $_ =~ /^(.*?)$(.*)$/ms) {
            my $header = $1;
            my $seq = $2;
            $seq =~ s/\R//g; # \R removes line breaks 
                    while ( $seq  =~ /([TC]C[CT]GGAAGC)/g) {
                            print $1, "\n";
                            $count = $count +1;
            }
    }
}
print "Fount $count matches\n";

should do the job.

HTH Georg

Comments