timtom timtom - 3 months ago 15
R Question

Summarize distribution of factors in R data frame

Say I have a data.frame like this:

X1 X2 X3
1 A B A
2 A C B
3 B A B
4 A A C


I would like to count the occurrences of A, B, C, etc. in each column, and return the result as

A_count B_count C_count
X1 3 1 0
X2 2 1 1
X3 1 2 1


I'm sure this question has a thousand duplicates, but I can't seem to find an answer that works for me :(

By running

apply(mydata, 2, table)


I get something like

$X1
B A
1 3
$X2
A C B
2 1 1


But it's not exactly what I want and if I try to build it back into a data frame, it doesn't work because I don't get the same number of columns for every row (like $X1 above where there are no C's).

What am I missing?

Many thanks!

Answer

You can refactor to include the factor levels common to each column, then tabulate. I would also recommend using lapply() instead of apply(), as apply() is for matrices.

df <- read.table(text = "X1   X2   X3
1 A    B    A
2 A    C    B
3 B    A    B
4 A    A    C", h=T)

do.call(
    rbind, 
    lapply(df, function(x) table(factor(x, levels=levels(unlist(df)))))
)
#    A B C
# X1 3 1 0
# X2 2 1 1
# X3 1 2 1