mkHun mkHun - 6 months ago 9
Perl Question

Is it possible to count the number of duplicates from two column by using single hash?

My input data as follow. From below data I want to unique the

p1 p2 .. p5
and the first column and get the count of those.

ID M N
cc1 1 p1
cc1 10 p2
cc1 10 p2
cc2 1 p1
cc2 2 p5
cc3 2 p1
cc3 2 p4


I expected the result was

ID M p1 p2 p3 p4 p5
cc1 3 1 2 0 0 0
cc3 2 1 0 0 1 0
cc2 2 1 0 0 0 1


For this I tried the
hash of hash
and
hash
I'm getting output what I expect. But my doubt is it is possible to do this by using single hash.? Because the same data was stored into the two different hash.

my (%hash,$hash2);
<$fh>;
while (<$fh>)
{
my($first,$second,$thrid) = split("\t");
$hash{$first}{$thrid}++; #I tried $hash{$first}++{$thrid}++ It throws syntax error
$hash2{$first}++; #it is possible to reduce this hash
}
my @ar = qw(p1 p2 p3 p4 p5);
$, = "\t";
print @ar,"\n";
foreach (keys %hash)
{
print "$_\t$hash2{$_}\t";
foreach my $ary(@ar)
{
if(!$hash{$_}{$ary})
{
print "0\t";
}
else
{
print "$hash{$_}{$ary}\t";
}
}
print "\n";
}

Answer

No need to use 2 hashes. you can use only hash of hash. I've just modified your code. see that below code.

use strict;
use warnings;
my %hash;
<DATA>;
while (<DATA>)
{
    chomp;
    my($first,$second,$thrid) = split("\t");
    $hash{$first}{$thrid}++; #I tried $hash{$first}++{$thrid}++ It throws syntax error
}
my @ar = qw(p1  p2  p3  p4  p5);
$, = "\t"; 
print @ar,"\n";
foreach (keys %hash)
{
#    print "$_\t$hash2{$_}\t";
    my @in = values $hash{$_};
    my $cnt = eval(join("+",@in));
    print "$_\t$cnt\t";
    foreach my $ary(@ar)
    {
        if(!$hash{$_}{$ary})
        {
            print "0\t"; 
        }
        else
        {
            print "$hash{$_}{$ary}\t";
        }
    }
    print "\n";
}

You have hash of hash to store data. first keys are id and second keys are N. Simply count values of the id, it gives the total values what you want.