AishwaryaKulkarni AishwaryaKulkarni - 5 months ago 6
Bash Question

Finding zeros and replacing them with another number in a matrix file by awk

I have a matrix where I want to replace every 0 with 0.1 and depending on how many zeros are replaced the max score in that line will be deducted by number of 0.1s added such that the below matrix will go from,

No line will contain only zeroes, since this is a probability matrix where each line adds up to1. If a highest number occurs more than once (0.5 in this case), then anyone can be changed,and the first line will always be the only one with letters in it,

>ACTTT ASB 0.098
0 0 1 0
0.75 0 0.25 0
0 0 0 1
0 1 0 0
1 0 0 0
1 0 0 0
0 1 0 0
0 1 0 0


to

>ACTTT ASB 0.098
0.1 0.1 0.7 0.1
0.55 0.1 0.25 0.1
0.1 0.1 0.1 0.7
0.1 0.7 0.1 0.1
0.7 0.1 0.1 0.1
0.7 0.1 0.1 0.1
0.1 0.7 0.1 0.1
0.1 0.7 0.1 0.1


I tried to use something like this in a loop from previous answers in here:

while read line ; do echo $line | awk 'NR>1{print gsub(/(^|[[:space:]])0([[:space:]]|$)/,"&")}'; echo $line | awk '{max=$2;for(i=3;i<=NF;i++)if($i>max)max=$i}END{print max}'; done < matrix_file

Answer

awk to the rescue!

$ awk -v eps=0.01 'function maxIx() {mI=1;
                                     for(i=1;i<=NF;i++) 
                                         if($mI<$i)mI=i; 
                                     return mI} 
               NR>1{mX=maxIx(); 
                    for(i=1;i<=NF;i++) 
                        if($i==0) {$i=eps;$mX-=eps}}1' file

>ACTTT  ASB  0.098
0.01 0.01 0.97 0.01
0.73 0.01 0.25 0.01
0.01 0.01 0.01 0.97
0.01 0.97 0.01 0.01
0.97 0.01 0.01 0.01
0.97 0.01 0.01 0.01
0.01 0.97 0.01 0.01
0.01 0.97 0.01 0.01

defined eps, as long as you have a sensible value it should work fine, but doesn't check for going below zero.

Comments