Vijay Bhaskarla Vijay Bhaskarla - 3 months ago 8
Python Question

Number of values lying in a specified range

I have a data frame like the one below:

NC_011163.1:1
NC_011163.1:22
NC_011163.1:44
NC_011163.1:65
NC_011163.1:73
NC_011163.1:87
NC_011163.1:104
NC_011163.1:130
NC_011163.1:151
NC_011163.1:172
NC_011163.1:194
NC_011163.1:210
NC_011163.1:235
NC_011163.1:255
NC_011163.1:295
NC_011163.1:320
NC_011163.1:445
NC_011163.1:520


I would like to scan the data frame using a window of 210 and count number of values lying in every 210 window.

Desired output:

output: Values
NC_011163.1:1-210 12
NC_011163.1:211-420 4
NC_011163.1:421-630 2


I'd greatly appreciate your inputs to solve this problem.

Thanks

V

Answer
awk -v t=210 'BEGIN{FS=":";t++}{++a[int($2/t)]}
   END{for(x in a){printf "%s:%s\t%d\n",$1,t*x"-"(x+1)*t,a[x]}}' file

will give this output:

NC_011163.1:0-211       12
NC_011163.1:211-422     4
NC_011163.1:422-633     2
  • You don't need to find out what is the max value, how many sections/ranges you have in result. This command does it for you.

  • The codes are easy to understand too I think, most codes are for the output format.