Kayan - 1 year ago 49

Bash Question

I have a time series daily dataset for 10 years (1995-2004) with some missing values as 9999.00. I would like to compute annual average for each year without considering the missing value.

I could able to make it by considering 365 days calendar with following command

`awk '!/\9999.00/{sum += $1; count++} NR%365==0{print count ? (sum) :9999.00;sum=count=0}'ifile`

But I can't able to modify with leap year calendar. I also need to add another column with years. My desire output is as

`1995 annual_average`

1996 annual_average

1997 annual_average

....

For example:

I have following data from 1995-2000. I need to compute average of every 3 lines instead of 365 and 4 lines instead of 366 if it is a leap year:

`3`

3

4

9999.00

4

9999.00

13

3

9999.00

9999.00

9999.00

9999.00

9999.00

3

4

2

2

2.6

5.1

4.5

Trial command:

`awk '!/\9999.00/{sum += $1; count++} NR%3==0{print count ? (sum) :9999.00;sum=count=0}'ifile`

Desire output:

`1995 3.33`

1996 8.5 it is a leap year, so average of 4 lines without considering missing values (4+13)/2

1997 3

1998 9999.00

1999 3

2000 3.55 leap year

Answer Source

This code works for your sample data. Of course, you will need to adjust the `target`

values:

```
BEGIN {
year = 0;
target = 3;
}
$1 < 9990.00 {
sum += $1;
count++;
}
NR == target {
if (count == 0) {
print "9999";
} else {
print sum / count;
}
sum = 0;
count = 0;
year++;
if (year % 4 == 1) {
target += 4;
} else {
target += 3;
}
}
```