Alpesh Querer - 1 year ago 66
Perl Question

# Impute class values based on map

I want to impute marker classes (either class A or class B), based on proximity of known marker classes. So for example if I know M1 and M4 are class A, then all markers positioned in the map between M1 and M4 can also be classified as A.

If I know marker M4 is class A and its position is chr1 13, and marker M7 is B with position 16, then we can classify all markers with position less than equal to (13+16)/2=14.5 as A and everything between 14.5 and 16 as B on the same chromosome. So M5 is A and M6 can be classified as B.

I have a map of sorted positions of markers

``````M0  chr1    9
M1  chr1    10
M2  chr1    11
M3  chr1    12
M4  chr1    13
M5  chr1    14
M6  chr1    15
M7  chr1    16
M8  chr2    1
M9  chr2    2
M10 chr2    3
M11 chr2    4
``````

So given a simple backbone of

``````M1  A
M4  A
M7  B
M8  B
M10 A
``````

I want to impute the rest of the markers on the map, if possible.

So my desired output is

``````M1  A
M2  A
M3  A
M4  A
M5  A
M6  B
M7  B
M8  B
M9  B
M10 A
``````

I am a biologist trying to learn a little bit of awk, and relaize this maybe just a computational problem and I`m not sure where to get started. Please help. I have access to unix cluster to run awk and perl. Please note, correct imputation can be only done between markers mapped to the same chromosome.

You never answered any of my questions, so here's a Perl solution that makes a lot of guesses

``````use strict;
use warnings 'all';
use autodie;

my (@markers, %markers);
{
open my \$fh, '<', 'markers.txt';

while ( <\$fh> ) {
my @marker = split;
push @markers, \@marker;
\$markers{\$marker[0]} = \$#markers;
}
}

my (\$i0, \$i1);

open my \$fh, '<', 'classes.txt';

while ( <\$fh> ) {

my (\$marker, \$class) = split;

\$i1 = \$markers{\$marker};
my \$m1 = \$markers[\$i1];
push @\$m1, \$class;

next unless defined \$i0;

my \$m0 = \$markers[\$i0];

next if \$m0->[1] ne \$m1->[1];          # Different chromosomes

my \$mid = ( \$m0->[2] + \$m1->[2] ) / 2; # Mid point between markers

for my \$m ( @markers[ \$i0+1 .. \$i1-1 ] ) {
push @\$m, \$m->[2] <= \$mid ? \$m0->[3] : \$m1->[3];
}
}
continue {
\$i0 = \$i1;
}

printf "%-4s%-8s%-4d%-s\n", @{\$_}[0..2], \$_->[3] // '' for @markers;
``````

### output

``````M0  chr1    9
M1  chr1    10  A
M2  chr1    11  A
M3  chr1    12  A
M4  chr1    13  A
M5  chr1    14  A
M6  chr1    15  B
M7  chr1    16  B
M8  chr2    1   B
M9  chr2    2   B
M10 chr2    3   A
M11 chr2    4
``````
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download