DeLuc - 1 year ago 45

R Question

Since a few days I’m searching the way to manage my data in R. I have the same set of individuals (n=5013) structured as follow: two asymmetric adjacency matrices (

`m1`

`m2`

`df`

`N`

`df$V`

I’m searching the way to subset the matrices using the variable

`df$V`

`m1`

`m2`

The following example illustrates my problem:

`# N are individuals. Two matrices (m1 and m2) and a dataframe (df) with a variable (df$V)`

> df

N V

1 a v1

2 b v2

3 c v3

4 d v1

5 e v2

6 f v3

7 g v1

> m1

a b c d e f g

a 7 3 9 8 1 6 8

b 1 6 9 2 9 4 4

c 2 3 2 7 9 7 3

d 9 7 6 3 2 6 6

e 9 9 6 5 5 6 5

f 1 1 1 6 1 5 9

g 6 2 5 2 1 8 5

> m2

a b c d e f g

a 8 3 7 8 4 3 2

b 2 8 4 2 7 7 2

c 8 3 1 6 9 9 4

d 7 3 6 7 4 9 5

e 5 8 7 1 7 6 6

f 9 6 8 9 6 6 2

g 4 8 8 1 9 7 3

For example, I subset the cells in the matrices where rows takes the values “v1” and “v3” and cols takes values “v2” in df$V

`> m1subseted`

b e

a 3 1

c 3 9

d 7 2

f 1 1

g 2 1

> m2subseted

b e

a 3 4

c 3 9

d 3 4

f 6 6

g 8 9

and then in m1-subseted subset the observations (or identify invalid cases) that has a cell value “<5” in m2-subseted. The result I’m searching: a matrix, subset of m1.

`#subset m1 if cell value in m2 is <5 / Invalid cells = NA`

b e

a 3 1

c 3 NA

d 7 2

f NA NA

g NA NA

`m1 <- as.matrix(data.frame(a = sample(1:10, size = 7),`

b= sample(1:10, size = 7),

c=sample(1:10, size = 7),

d=sample(1:10, size = 7),

e=sample(1:10, size = 7),

f=sample(1:10, size = 7),

g=sample(1:10, size = 7)))

rownames(m1)<-colnames(m1)

m2 <- as.matrix(data.frame(a = sample(1:10, size = 7),

b= sample(1:10, size = 7),

c=sample(1:10, size = 7),

d=sample(1:10, size = 7),

e=sample(1:10, size = 7),

f=sample(1:10, size = 7),

g=sample(1:10, size = 7)))

rownames(m2)<-colnames(m2)

df <- data.frame(N = as.factor(letters[1:7]),

V = c("v1","v2","v3","v1","v2","v3","v1"))

The solution proposed by @jkt works fine, except when labels are complex (with accent marks, parentheses, etc.), as in my original dataset. The solution I find is to change the complex labels by simplest ones before apply algorithms, and restore the original labels on the result.

I share the code I used with the solution provided by @jkt (adapted to the example) in the hope that it can be useful to someone.

`#Create new labels. In this case are numbers, where 7`

#correspond to the dimmensions of matrices and observations on df

new.code.labels<-c(1:7)

#Create new col/variable on df

df$TempLabel<-new.code.labels

#Recode rows and cols on matrices

rownames(m1)<-new.code.labels

colnames(m1)<-new.code.labels

rownames(m2)<-new.code.labels

colnames(m2)<-new.code.labels

#Apply algorithm proposed by @jkt

crit1 <- c('v1','v3')

crit2 <- 'v2'

#Observe I use new labels on dataframe (df$TempLabel)

m11 <- m1[df$TempLabel[which(df$V %in% crit1)], df$TempLabel[which(df$V %in% crit2)]]

m21 <- m2[df$TempLabel[which(df$V %in% crit1)], df$TempLabel[which(df$V %in% crit2)]]

m11[!(m21<5)] <- NA

m11

#To regain the original labels on results

row.coded.labels.result<-rownames(m11)

df.subseted.by.result.row<-subset(df, df$TempLabel %in% row.coded.labels.result)

rownames(m11)<-df.subseted.by.result.row$N

col.coded.labels.result<-colnames(m11)

df.subseted.by.result.col<-subset(df, df$TempLabel %in% col.coded.labels.result)

colnames(m11)<-df.subseted.by.result.col$N

m11

Answer Source

I would just use a series of subsetting commands.

This defines the two criteria (based on v1, v3 and v2):

```
crit1 <- c('v1','v3')
crit2 <- 'v2'
```

This subsets the matrices based on the criteria and the corresponding row/column names:

```
m11 <- m1[df$N[which(df$V %in% crit1)], df$N[which(df$V %in% crit2)]]
m21 <- m2[df$N[which(df$V %in% crit1)], df$N[which(df$V %in% crit2)]]
```

This sets all those values `NA`

that do not meet your last criterion within the second subset matrix.

```
m11[!(m21<5)] <- NA
```

Calling `m11`

then gives you:

```
b e
a 3 1
c 3 NA
d 7 2
f NA NA
g NA NA
```

You could turn this into a function with all your criteria as arguments plus the matrices and the dataframe.