Kayan - 4 months ago 20
Bash Question

# Calculating average in irregular intervals without considering missing values in shell script?

I have a dataset with many missing values as -999. Part of the data is

``````input.txt
30
-999
10
40
23
44
-999
-999
31
-999
54
-999
-999
-999
-999
-999
-999
10
23
2
5
3
8
8
7
9
6
10
and so on
``````

I would like calculate the average in each 5,6,6 rows interval without considering the missing values.

Desire output is

``````ofile.txt
25.75   (i.e. consider first 5 rows and take average without considering missing values, so (30+10+40+23)/4)
43      (i.e. consider next 6 rows and take average without considering missing values, so (44+31+54)/3)
-999    (i.e. consider next 6 and take average without considering missing values. Since all are missing, so write as a missing value -999)
8.6     (i.e. consider next 5 rows and take average (10+23+2+5+3)/5)
8     (i.e. consider next 6 rows and take average)
``````

I can do if it is regular interval (lets say 5) with this

``````awk '!/\-999/{sum += \$1; count++} NR%5==0{print count ? (sum/count) :-999;sum=count=0}' input.txt
``````

With AWK

``````awk -v f="5" 'f&&f--&&\$0!=-999{c++;v+=\$0} NR%17==0{f=5;r++}
!f&&NR%17!=0{f=6;r++} r&&!c{print -999;r=0} r&&c{print v/c;r=v=c=0}
END{if(c!=0)print v/c}' input.txt
``````

Output

``````25.75
43
-999
8.6
8
``````

Breakdown

``````f&&f--&&\$0!=-999{c++;v+=\$0} #add valid values and increment count
NR%17==0{f=5;r++} #reset to 5,6,6 pattern
!f&&NR%17!=0{f=6;r++} #set 6 if pattern doesnt match
r&&!c{print -999;r=0} #print -999 if no valid values
r&&c{print v/c;r=v=c=0} #print avg
END{
if(c!=0) #print remaining values avg
print v/c
}
``````