DeLuc - 1 year ago 72
R Question

# R - Subset a matrix in function of a variable from data frame and values of an other matrix

Since a few days I’m searching the way to manage my data in R. I have the same set of individuals (n=5013) structured as follow: two asymmetric adjacency matrices (

`m1`
and
`m2`
) (nxn square matrices where all the individuals compose rows and columns of matrices) and a data frame (
`df`
) with my set of individuals (
`N`
) and one variable (
`df\$V`
).

I’m searching the way to subset the matrices using the variable
`df\$V`
(different criteria/variable values for rows and cols) and subset
`m1`
(or identify invalid cases) in function of the cell values of
`m2`
.

The following example illustrates my problem:

``````# N are individuals. Two matrices (m1 and m2) and a dataframe (df) with a variable (df\$V)
> df
N  V
1 a v1
2 b v2
3 c v3
4 d v1
5 e v2
6 f v3
7 g v1

> m1
a b c d e f g
a 7 3 9 8 1 6 8
b 1 6 9 2 9 4 4
c 2 3 2 7 9 7 3
d 9 7 6 3 2 6 6
e 9 9 6 5 5 6 5
f 1 1 1 6 1 5 9
g 6 2 5 2 1 8 5

> m2
a b c d e f g
a 8 3 7 8 4 3 2
b 2 8 4 2 7 7 2
c 8 3 1 6 9 9 4
d 7 3 6 7 4 9 5
e 5 8 7 1 7 6 6
f 9 6 8 9 6 6 2
g 4 8 8 1 9 7 3
``````

For example, I subset the cells in the matrices where rows takes the values “v1” and “v3” and cols takes values “v2” in df\$V

``````> m1subseted
b e
a 3 1
c 3 9
d 7 2
f 1 1
g 2 1
> m2subseted
b e
a 3 4
c 3 9
d 3 4
f 6 6
g 8 9
``````

and then in m1-subseted subset the observations (or identify invalid cases) that has a cell value “<5” in m2-subseted. The result I’m searching: a matrix, subset of m1.

``````#subset m1 if cell value in m2 is <5 / Invalid cells = NA
b e
a 3 1
c 3 NA
d 7 2
f NA NA
g NA NA
``````

# Reproducible data

``````m1 <- as.matrix(data.frame(a = sample(1:10, size = 7),
b= sample(1:10, size = 7),
c=sample(1:10, size = 7),
d=sample(1:10, size = 7),
e=sample(1:10, size = 7),
f=sample(1:10, size = 7),
g=sample(1:10, size = 7)))
rownames(m1)<-colnames(m1)

m2 <- as.matrix(data.frame(a = sample(1:10, size = 7),
b= sample(1:10, size = 7),
c=sample(1:10, size = 7),
d=sample(1:10, size = 7),
e=sample(1:10, size = 7),
f=sample(1:10, size = 7),
g=sample(1:10, size = 7)))
rownames(m2)<-colnames(m2)

df <- data.frame(N = as.factor(letters[1:7]),
V = c("v1","v2","v3","v1","v2","v3","v1"))
``````

# Comment

The solution proposed by @jkt works fine, except when labels are complex (with accent marks, parentheses, etc.), as in my original dataset. The solution I find is to change the complex labels by simplest ones before apply algorithms, and restore the original labels on the result.
I share the code I used with the solution provided by @jkt (adapted to the example) in the hope that it can be useful to someone.

``````#Create new labels. In this case are numbers, where 7
#correspond to the dimmensions of matrices and observations on df
new.code.labels<-c(1:7)
#Create new col/variable on df
df\$TempLabel<-new.code.labels
#Recode rows and cols on matrices
rownames(m1)<-new.code.labels
colnames(m1)<-new.code.labels
rownames(m2)<-new.code.labels
colnames(m2)<-new.code.labels

#Apply algorithm proposed by @jkt
crit1 <- c('v1','v3')
crit2 <- 'v2'
#Observe I use new labels on dataframe (df\$TempLabel)
m11 <- m1[df\$TempLabel[which(df\$V %in% crit1)], df\$TempLabel[which(df\$V %in% crit2)]]
m21 <- m2[df\$TempLabel[which(df\$V %in% crit1)], df\$TempLabel[which(df\$V %in% crit2)]]
m11[!(m21<5)] <- NA
m11

#To regain the original labels on results
row.coded.labels.result<-rownames(m11)
df.subseted.by.result.row<-subset(df, df\$TempLabel %in% row.coded.labels.result)
rownames(m11)<-df.subseted.by.result.row\$N

col.coded.labels.result<-colnames(m11)
df.subseted.by.result.col<-subset(df, df\$TempLabel %in% col.coded.labels.result)
colnames(m11)<-df.subseted.by.result.col\$N
m11
``````

I would just use a series of subsetting commands.

This defines the two criteria (based on v1, v3 and v2):

``````crit1 <- c('v1','v3')
crit2 <- 'v2'
``````

This subsets the matrices based on the criteria and the corresponding row/column names:

``````m11 <- m1[df\$N[which(df\$V %in% crit1)], df\$N[which(df\$V %in% crit2)]]
m21 <- m2[df\$N[which(df\$V %in% crit1)], df\$N[which(df\$V %in% crit2)]]
``````

This sets all those values `NA` that do not meet your last criterion within the second subset matrix.

``````m11[!(m21<5)] <- NA
``````

Calling `m11` then gives you:

``````   b  e
a  3  1
c  3 NA
d  7  2
f NA NA
g NA NA
``````

You could turn this into a function with all your criteria as arguments plus the matrices and the dataframe.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download