akhil sood - 1 year ago 51

R Question

I have a table with over 200 categorical variables. Sample:

`Cat1 Cat2 Cat3`

A B A

B A A

A C A

A B A

I want to get frequencies (number of times) any category appeared in the dataset. Something like this:

- A 8
- B 3

..

I very new to R and tried using a for loop to get the result. I am sure that there are better ways to do so. Can you please help me with this?

Answer Source

In general, the most convenient function to count how many tokens you have of each type is ?table:

```
d <- read.table(text="Cat1 Cat2 Cat3
A B A
B A A
A C A
A B A", header=T)
table(d$Cat1)
# A B
# 3 1
```

The most convenient way to execute `table()`

for every categorical variable in a dataset is to use ?summary.data.frame:

```
summary(d)
# Cat1 Cat2 Cat3
# A:3 A:1 A:4
# B:1 B:2
# C:1
```

On the other hand, if you want to get a table that collapses over all categorical variables, you can use `table()`

with ?unlist:

```
table(unlist(d))
# A B C
# 8 3 1
```

To understand what's happening there, the thing to realize is that in `R`

a data frame is a special kind of list: each variable is a vector and the data frame is a list of vectors of equal length (cf., here). The `unlist()`

function turns those into one long vector concatenated from first to last. Note that if you have some non-categorical variables mixed in, you will need to exclude those with something like `table(unlist(d[,c(<variables to use>)]))`

.