Daniel Daniel - 19 days ago 7
R Question

Counting 0`s, 1`s, 99`s and NA`s for each variable in a data frame

I have a data frame with 118 variables with

0's
,
1's
99's
and
NA's
. I need count for each variable how many
99's
,
NA's
,
1's
and
0's
there is (the
99
is "not apply", the
0
is "no", the
1
is "yes" and the
NA
is "No answer"). I try to do this with
table
function but it works with vectors, how can I do it for all the set of variables?

There is a little reproducible example of the data frame:

forest<-c(1,1,1,1,0,0,0,1,1,1,0,NA,0,NA,0,99,99,1,0,NA)
water<-c(1,NA,NA,NA,NA,99,99,0,0,0,1,1,1,0,0,NA,NA,99,1,0)
rain<-c(1,NA,1,0,1,99,99,0,1,0,1,0,1,0,0,NA,99,99,1,1)
fire<-c(1,0,0,0,1,99,99,NA,NA,NA,1,0,1,0,0,NA,99,99,1,1)

df<-data.frame(forest,water,rain,fire)


And I need write in a data frame the result for variable, like this:

forest water rain fire
1 8 5 8 6
0 7 6 6 6
99 2 3 4 4
NA 3 6 2 4

Answer

Can't find a good dupe, so here's my comment as an answer:

A data frame is really a list of columns. lapply will apply a function to every item in the input (every column, in the case of a data frame) and return a list with each result:

lapply(df, table)
# $forest
# 
#  0  1 99 
#  7  8  2 
# 
# $water
# 
#  0  1 99 
#  6  5  3 
# 
# $rain
# 
#  0  1 99 
#  6  8  4 
# 
# $fire
# 
#  0  1 99 
#  6  6  4 

sapply is like lapply, but it will attempt to simplify the result instead of always returning a list. In both cases, you can pass along additional arguments to the function being applied, like useNA = "always" to table to have NA included in the output:

sapply(df, table, useNA = "always")
#      forest water rain fire
# 0         7     6    6    6
# 1         8     5    8    6
# 99        2     3    4    4
# <NA>      3     6    2    4

For lots more info, check out R Grouping functions: sapply vs. lapply vs. apply. vs. tapply vs. by vs. aggregate


To compare with some other answers: apply is similar to lapply and sapply, but it is intended for use with matrices or higher-dimensional arrays. The only time you should use apply on a data.frame is when you need to apply a function to each row. For functions on data frame columns, prefer lapply or sapply. The reason is that apply will coerce the data frame to a matrix first, which can have unintended consequences if you have columns of different classes.