Manolete - 1 year ago 70
Bash Question

# Calculate the average over a number of columns

I am trying to create a script which calculates the average over a number of rows.

This number would depend on the number of samples that I have, which varies.

An example of these files is here:

``````24  1  2.505
24  2  0.728
24  3  0.681
48  1  2.856
48  2  2.839
48  3  2.942
96  1  13.040
96  2  12.922
96  3  13.130
192 1  50.629
192 2  51.506
192 3  51.016
``````

The average is calculated on the 3rd column and,

the second column indicates the number of samples, 3 in this particular case.

Therefore, I should obtain 4 values here.

One average value per 3 rows.

I have tried something like:

``````count=3;
total=0;

for i in \$( awk '{ print \$3; }' \${file} )
do
for j in 1 2 3
do
total=\$(echo \$total+\$i | bc )
done
echo "scale=2; \$total / \$count" | bc
done
``````

But it is not giving me the right answer, instead I think it calculates an average per each group of three rows.

The average is calculated on the 3rd column and,

the second column indicates the number of samples, 3 in this particular case.

Therefore, I should obtain 4 values here.

One average value per 3 rows.

I have tried something like:

``````count=3;
total=0;

for i in \$( awk '{ print \$3; }' \${file} )
do
for j in 1 2 3
do
total=\$(echo \$total+\$i | bc )
done
echo "scale=2; \$total / \$count" | bc
done
``````

But it is not giving me the right answer, instead I think it calculates an average per each group of three rows.

Expected output

``````24  1.3046
48  2.879
96  13.0306
192 51.0503
``````

``````\$ awk 'NR>1 && \$1!=p{print p, s/c; c=s=0} {s+=\$3;c++;p=\$1} END {print p, s/c}' file