Kayan Kayan - 6 months ago 11
Bash Question

Compute annual average from a times series in leap year calender using shell script

I have a time series daily dataset for 10 years (1995-2004) with some missing values as 9999.00. I would like to compute annual average for each year without considering the missing value.

I could able to make it by considering 365 days calendar with following command

awk '!/\9999.00/{sum += $1; count++} NR%365==0{print count ? (sum) :9999.00;sum=count=0}'ifile


But I can't able to modify with leap year calendar. I also need to add another column with years. My desire output is as

1995 annual_average
1996 annual_average
1997 annual_average
....


For example:
I have following data from 1995-2000. I need to compute average of every 3 lines instead of 365 and 4 lines instead of 366 if it is a leap year:

3
3
4
9999.00
4
9999.00
13
3
9999.00
9999.00
9999.00
9999.00
9999.00
3
4
2
2
2.6
5.1
4.5


Trial command:

awk '!/\9999.00/{sum += $1; count++} NR%3==0{print count ? (sum) :9999.00;sum=count=0}'ifile


Desire output:

1995 3.33
1996 8.5 it is a leap year, so average of 4 lines without considering missing values (4+13)/2
1997 3
1998 9999.00
1999 3
2000 3.55 leap year

Answer

This code works for your sample data. Of course, you will need to adjust the target values:

BEGIN {
    year = 0;
    target = 3;
}
$1 < 9990.00 {
    sum += $1;
    count++;
}
NR == target {
    if (count == 0) {
        print "9999";
    } else {
        print sum / count;
    }
    sum = 0;
    count = 0;
    year++;
    if (year % 4 == 1) {
        target += 4;
    } else {
        target += 3;
    }
}