user3375672 - 1 year ago 72
R Question

# R: fill matrix based on occurences in data frame variable

``````set.seed(1)
names <- letters[1:3]

df <-
data.frame(id = LETTERS[1:5]
names = replicate(5,paste0(sample(names, sample(1:3)),collapse = ',')),
stringsAsFactors = F)
``````

Then each id in
`df`
is associated with 1-3 names.

``````> df
id names
1  A     a
2  B   b,c
3  C   c,b
4  D     c
5  E   b,c
``````

How to efficiently populate a matrix (5x3 in our example) with 0's (name not in row) and 1' (name in row). Matrix would look like:

``````res <-
matrix(nrow = nrow(df), ncol = length(names),
dimnames = list(df\$id, names), data = 0)

> res
a b c
A 0 0 0
B 0 0 0
C 0 0 0
D 0 0 0
E 0 0 0
``````

And the first row would be (1,0,0), second (0,1,1) etc.

We can use `table` after splitting the 'names' by `,`, and `stack`ing the `list` output to a `data.frame`.

``````table(stack(setNames(strsplit(df\$names, ","), df\$id))[2:1])
#   values
#ind a b c
#  A 1 0 0
#  B 0 1 1
#  C 0 1 1
#  D 0 0 1
#  E 0 1 1
``````

Or another option is `mtabulate` from `qdapTools` after splitting the 'names' column.

``````library(qdapTools)
mtabulate(setNames(strsplit(df\$names, ","), df\$id))
#  a b c
#A 1 0 0
#B 0 1 1
#C 0 1 1
#D 0 0 1
#E 0 1 1
``````

If we are using `dplyr/tidyr`, one option is `separate_rows/spread`

``````library(dplyr)
library(tidyr)
separate_rows(df, names) %>%
mutate(v1 = 1) %>%
#  id a b c
#1  A 1 0 0
#2  B 0 1 1
#3  C 0 1 1
#4  D 0 0 1
#5  E 0 1 1
``````

Or we can use `dcast` from `data.table` after splitting

``````library(data.table)
dcast(setDT(df)[, strsplit(names, ","), id], id ~V1, length)
``````

### data

``````df <- structure(list(id = c("A", "B", "C", "D", "E"), names = c("a",
"b,c", "c,b", "c", "b,c")), .Names = c("id", "names"),
class = "data.frame", row.names = c("1", "2", "3", "4", "5"))
``````
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download