pachamaltese - 8 months ago 36

R Question

I want/need to create a matrix of 1's and 0's that contains the information about common terms. I created a matrix of common terms between columns (e.g. with rows like 1,4,2) but I do not figure out how to disaggregate it.

Here is a toy and reproducible example. Steps (1)-(4) are ok and step (5) is what I cannot do at the moment.

(1) I have this (fictional) dataset

`vec1 <- c("apple","pear","apple and pear")`

vec2 <- c("apple and pear","banana","orange")

vec3 <- c("orange and pear","banana","apple")

my.data.frame <- as.data.frame(cbind(vec1,vec2,vec3))

vec1 vec2 vec3

1 apple apple and pear orange and pear

2 pear banana banana

3 apple and pear orange apple

(2) I extract the variables and the content

`vectors.list <- as.vector(colnames(my.data.frame))`

list.of.fruits <- unique(as.vector(unlist(my.data.frame)))

(2) I write down a function to count common terms (this is a deformation of this post: How to count common words and store the result in a matrix?)

`common.fruits <- function(vList) {`

v <- lapply(vList, tolower)

do.call(rbind, lapply(v, function(x) {

do.call(c, lapply(v, function(y) length(intersect(x, y))))

}))

}

(4) I use get and lapply to do some efficient (I guess) calculation

`compare <- lapply(vectors.list,get)`

common.terms.matrix <- common.fruits(compare)

rownames(common.terms.matrix) <- vectors.list

colnames(common.terms.matrix) <- vectors.list

common.terms.matrix

vec1 vec2 vec3

vec1 3 1 1

vec2 1 3 1

vec3 1 1 3

(5) How do I disaggregate that last matrix into this matrix or data.frame (the "|" are to indicate that this was written by hand)

`| apple | pear | apple and pear | banana | orange | orange and pear`

vec1 | 1 | 1 | 1 | 0 | 0 | 0

vec2 | 0 | 0 | 1 | 1 | 1 | 0

vec3 | 1 | 0 | 0 | 1 | 0 | 1

Answer

You can try something like the following:

```
my.data.frame$id <- 1:nrow(my.data.frame)
m <- melt(my.data.frame, id='id')
m$val <- 1
df <- dcast(m, variable~value, value.var='val')
df[is.na(df)] <- 0
df
variable apple apple and pear banana orange orange and pear pear
1 vec1 1 1 0 0 0 1
2 vec2 0 1 1 1 0 0
3 vec3 1 0 1 0 1 0
```