Alpesh Querer Alpesh Querer - 4 months ago 15
Perl Question

Impute class values based on map

I want to impute marker classes (either class A or class B), based on proximity of known marker classes. So for example if I know M1 and M4 are class A, then all markers positioned in the map between M1 and M4 can also be classified as A.

If I know marker M4 is class A and its position is chr1 13, and marker M7 is B with position 16, then we can classify all markers with position less than equal to (13+16)/2=14.5 as A and everything between 14.5 and 16 as B on the same chromosome. So M5 is A and M6 can be classified as B.

I have a map of sorted positions of markers

M0 chr1 9
M1 chr1 10
M2 chr1 11
M3 chr1 12
M4 chr1 13
M5 chr1 14
M6 chr1 15
M7 chr1 16
M8 chr2 1
M9 chr2 2
M10 chr2 3
M11 chr2 4

So given a simple backbone of

M1 A
M4 A
M7 B
M8 B
M10 A

I want to impute the rest of the markers on the map, if possible.

So my desired output is

M1 A
M2 A
M3 A
M4 A
M5 A
M6 B
M7 B
M8 B
M9 B
M10 A

I am a biologist trying to learn a little bit of awk, and relaize this maybe just a computational problem and I`m not sure where to get started. Please help. I have access to unix cluster to run awk and perl. Please note, correct imputation can be only done between markers mapped to the same chromosome.


You never answered any of my questions, so here's a Perl solution that makes a lot of guesses

use strict;
use warnings 'all';
use autodie;

my (@markers, %markers);
    open my $fh, '<', 'markers.txt';

    while ( <$fh> ) {
        my @marker = split;
        push @markers, \@marker;
        $markers{$marker[0]} = $#markers;

my ($i0, $i1);

open my $fh, '<', 'classes.txt';

while ( <$fh> ) {

    my ($marker, $class) = split;

    $i1 = $markers{$marker};
    my $m1 = $markers[$i1];
    push @$m1, $class;

    next unless defined $i0;

    my $m0 = $markers[$i0];

    next if $m0->[1] ne $m1->[1];          # Different chromosomes

    my $mid = ( $m0->[2] + $m1->[2] ) / 2; # Mid point between markers

    for my $m ( @markers[ $i0+1 .. $i1-1 ] ) {
        push @$m, $m->[2] <= $mid ? $m0->[3] : $m1->[3];
continue {
    $i0 = $i1;

printf "%-4s%-8s%-4d%-s\n", @{$_}[0..2], $_->[3] // '' for @markers;


M0  chr1    9   
M1  chr1    10  A
M2  chr1    11  A
M3  chr1    12  A
M4  chr1    13  A
M5  chr1    14  A
M6  chr1    15  B
M7  chr1    16  B
M8  chr2    1   B
M9  chr2    2   B
M10 chr2    3   A
M11 chr2    4