Manolete Manolete - 1 month ago 8
Bash Question

Calculate the average over a number of columns

I am trying to create a script which calculates the average over a number of rows.

This number would depend on the number of samples that I have, which varies.

An example of these files is here:

24 1 2.505
24 2 0.728
24 3 0.681
48 1 2.856
48 2 2.839
48 3 2.942
96 1 13.040
96 2 12.922
96 3 13.130
192 1 50.629
192 2 51.506
192 3 51.016


The average is calculated on the 3rd column and,

the second column indicates the number of samples, 3 in this particular case.

Therefore, I should obtain 4 values here.

One average value per 3 rows.

I have tried something like:

count=3;
total=0;

for i in $( awk '{ print $3; }' ${file} )
do
for j in 1 2 3
do
total=$(echo $total+$i | bc )
done
echo "scale=2; $total / $count" | bc
done


But it is not giving me the right answer, instead I think it calculates an average per each group of three rows.

The average is calculated on the 3rd column and,

the second column indicates the number of samples, 3 in this particular case.

Therefore, I should obtain 4 values here.

One average value per 3 rows.

I have tried something like:

count=3;
total=0;

for i in $( awk '{ print $3; }' ${file} )
do
for j in 1 2 3
do
total=$(echo $total+$i | bc )
done
echo "scale=2; $total / $count" | bc
done


But it is not giving me the right answer, instead I think it calculates an average per each group of three rows.

Expected output

24 1.3046
48 2.879
96 13.0306
192 51.0503

Answer

Apparently I brought a third view to the problem. In awk:

$ awk 'NR>1 && $1!=p{print p, s/c; c=s=0} {s+=$3;c++;p=$1} END {print p, s/c}' file
24 1.30467
48 2.879
96 13.0307
192 51.0503
Comments