Paul Greeley - 1 year ago 104
R Question

# R: tapply(x,y,sum) returns NA instead of 0

I have a data set that contains occurrences of events over multiple years, regions, quarters, and types. Sample:

``````REGION Prov Year Quarter Type Hit Miss
xxx     yy  2008  4     Snow  1   0
xxx     yy  2009  2     Rain  0   1
``````

I have variables defined to examine the columns of interest:

``````syno.h <- data\$Type
quarter.number<-data\$Quarter
syno.wrng<- data\$Type
``````

I wanted to get the amount of Hits per type, and quarter for all of the data. Given that the Hits are either 0 or 1, then a simple sum() function using tapply was my first attempt.

``````tapply(syno.h, list(syno.wrng, quarter.number), sum)
``````

this returned:

``````              1   2   3   4
ARCO         NA  NA  NA   0
BLSN          0  NA  15  74
BLZD          4  NA  17  54
FZDZ         NA  NA   0   1
FZRA         26   0 143 194
RAIN        106 126 137 124
SNOW         43   2 215 381
SNSQ          0  NA  18  53
WATCHSNSQ    NA  NA  NA   0
WATCHWSTM     0  NA  NA  NA
WCHL         NA  NA  NA   1
WIND         47  38 155 167
WIND-SUETES  27   6  37  56
WIND-WRECK   34  14  44  58
WTSM          0   1   7  18
``````

For a some of the types that have no occurrences in a given quarter, tapply sometimes returns NA instead of zero. I have checked the data a number of times, and I am confident that it is clean. The values that aren't NA are also correct.

If I check the type/quarter combinations that return NA with tapply using just sum() I get values I expect:

``````sum(syno.h[quarter.number==3&syno.wrng=="BLSN"])
[1] 15
>  sum(syno.h[quarter.number==1&syno.wrng=="BLSN"])
[1] 0
>  sum(syno.h[quarter.number==2&syno.wrng=="BLSN"])
[1] 0
>  sum(syno.h[quarter.number==2&syno.wrng=="ARCO"])
[1] 0
``````

It seems that my issue is with how I use tapply with sum, and not with the data itself.

Does anyone have any suggestions on what the issue may be?

I have two potential solutions for you depending on exactly what you are looking for. If you just are interested in your number of positive `Hit`s per `Type` and `Quarter` and don't need a record of when no `Hit`s exist, you can get an answer as

``````aggregate(data[["Hit"]], by =  data[c("Type","Quarter")], FUN = sum)
``````

If it is important to keep a record of the ones where there are no hits as well, you can use

``````dataHit <- data[data[["Hit"]] == 1, ]
dataHit[["Type"]] <- factor(data[["Type"]])
dataHit[["Quarter"]] <- factor(data[["Quarter"]])
table(dataHit[["Type"]], dataHit[["Quarter"]])
``````
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download