akhil sood akhil sood - 1 month ago 8
R Question

Frequency cross-tabulation in R for categorical variables

I have a table with over 200 categorical variables. Sample:

Cat1 Cat2 Cat3
A B A
B A A
A C A
A B A


I want to get frequencies (number of times) any category appeared in the dataset. Something like this:


  • A 8

  • B 3
    ..



I very new to R and tried using a for loop to get the result. I am sure that there are better ways to do so. Can you please help me with this?

Answer

In general, the most convenient function to count how many tokens you have of each type is ?table:

d <- read.table(text="Cat1 Cat2 Cat3
A B A
B A A
A C A
A B A", header=T)
table(d$Cat1)
# A B 
# 3 1 

The most convenient way to execute table() for every categorical variable in a dataset is to use ?summary.data.frame:

summary(d)
#  Cat1  Cat2  Cat3 
#  A:3   A:1   A:4  
#  B:1   B:2        
#        C:1        

On the other hand, if you want to get a table that collapses over all categorical variables, you can use table() with ?unlist:

table(unlist(d))
# A B C 
# 8 3 1 

To understand what's happening there, the thing to realize is that in R a data frame is a special kind of list: each variable is a vector and the data frame is a list of vectors of equal length (cf., here). The unlist() function turns those into one long vector concatenated from first to last. Note that if you have some non-categorical variables mixed in, you will need to exclude those with something like table(unlist(d[,c(<variables to use>)])).