Dmitry Malugin Dmitry Malugin - 2 months ago 11
R Question

What's wrong with tapply (args are unequal length) in this case?

Data was taken from there http://open.canada.ca/data/en/dataset/b52664cf-bfd9-49ad-849a-cb88c92553b9 (English version)

glacier <- read.csv("glacier.csv", stringsAsFactors = F)
str(glacier)
'data.frame': 518 obs. of 6 variables:
$ Ref_Date : int 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 ...
$ GEO : chr "Helm Glacier - southern Coast Mountains (Garibaldi Provincial Park), British Columbia" "Helm Glacier - southern Coast Mountains (Garibaldi Provincial Park), British Columbia" "Helm Glacier - southern Coast Mountains (Garibaldi Provincial Park), British Columbia" "Helm Glacier - southern Coast Mountains (Garibaldi Provincial Park), British Columbia" ...
$ MEASURE : chr "Annual mass balance" "Annual mass balance" "Annual mass balance" "Annual mass balance" ...
$ Vector : chr "v54326054" "v54326054" "v54326054" "v54326054" ...
$ Coordinate: num 1.1 1.1 1.1 1.1 1.1 1.1 1.1 1.1 1.1 1.1 ...
$ Value : chr "-1460.0" "-780.0" "-2730.0" "-940.0" ...

tapply(X = as.numeric(glacier[glacier$MEASURE == "Annual mass balance", c("Value")]),
INDEX = unique(glacier[ , 2]), FUN = median, na.rm = T)


gives error:
Error in tapply(as.numeric(glacier[glacier$MEASURE == "Annual mass balance", :
аргументы должны иметь одинаковую длину
I've checked arguments and they seems quite normal for tapply function. Have no idea, what's wrong. Thanks in advance.

EDIT:

tapply(X = as.numeric(glacier[glacier$MEASURE == "Annual mass balance", c("Value")]),
INDEX = glacier[ , 2], FUN = median, na.rm = T)


gives the same error

Answer

You are trying to subset the glacier data frame and I would expect one or more elements in glacier$Measure !="Annual mass balance". Thus the column length of glacier[glacier$MEASURE == "Annual mass balance", c("Value")] is not equal to the length of: glacier[ , 2]. If this true then you would need to subset the index also.

Here is a solution with creating a subset of your data to improve readability:

glacier <- read.csv("01530102-eng.csv", stringsAsFactors = F)

glacierreduced<-glacier[glacier$MEASURE == "Annual mass balance",]
tapply(X = as.numeric(glacierreduced$Value),  INDEX = glacierreduced[ , 2],  
       FUN = median, na.rm = T)