Prradep Prradep - 1 month ago 6
R Question

subset a matrix using the rownames mapping and a user-defined function

I have a matrix and would like to subset it using mapping and function.

Example: Randomly populated matrix using

runif
and
set.seed
for reproducibility.

set.seed(1)
exp.mat <- matrix(runif(9*6, 5.0, 10), nrow = 9, ncol = 6)
rownames(exp.mat) <- c('a','b1','b2','b3','c','d1','d2','e1','e2')
colnames(exp.mat) <- c('s1','s2','s3','s4','s5','s6')

exp.mat
s1 s2 s3 s4 s5 s6
a 5.353395 6.661973 6.733417 8.562573 6.198147 8.024666
b1 5.497331 8.254352 6.668875 6.999972 5.294672 8.273620
b2 6.581359 6.290084 7.381756 6.626761 8.211441 6.765986
b3 7.593171 7.392726 9.460992 8.785436 9.381346 6.351301
c 8.310025 8.831553 9.321697 6.013461 8.894573 9.963420
d1 7.034151 5.421235 6.949948 8.555606 8.986544 8.167466
d2 9.564380 9.376607 8.886603 5.608460 7.276372 6.066041
e1 6.468017 6.695365 9.803090 6.227443 7.050420 5.646862
e2 7.295329 9.197202 7.173297 5.716522 9.054351 7.390590


Mappings with column
rown
containing
rownames
of original matrix, column
map
containing corresponding mapping.

maps <- data.frame(rown=c('a','b1','b2','b3','c','d1','d2','e1','e1'),
map =c('a','b','b','b','c','d','d','e','f'))
maps

rown map
1 a a
2 b1 b
3 b2 b
4 b3 b
5 c c
6 d1 d
7 d2 d
8 e1 e
9 e1 f


Function,
mean
is considered here for the selection of rows when there are more mappings(case 2).

apply(exp.mat, 1, mean)
a b1 b2 b3 c d1 d2 e1 e2
6.922362 6.831470 6.976231 8.160829 8.555789 7.519158 7.796410 6.981866 7.637882


Based on the mappings,


  1. if there is only one value in
    rown
    mapping to
    map
    then it should
    directly copy entire row. eg:
    a
    ,
    c
    have only one mapping.

  2. if there are more than one value in
    rown
    mapping to
    map
    then it
    should copy the entire row which has the highest value from the resultant function above. eg:
    b1
    ,
    b2
    ,
    b3
    maps to
    b
    ;
    b3
    has highest
    mean
    . So, it has to chose
    b3
    and likewise
    d2
    .

  3. if there is a value in
    rown
    mapping to more than one value in
    map
    then it should discard those rows. eg:
    e1
    has more than one mapping value
    e
    ,
    f
    .

  4. if there is no mapping, then discard the row. eg:
    e2
    has no corresponding mapping.



Expected output: subsetted matrix

> exp.mat.trans
s1 s2 s3 s4 s5 s6
a 5.353395 6.661973 6.733417 8.562573 6.198147 8.024666
b 7.593171 7.392726 9.460992 8.785436 9.381346 6.351301
c 8.310025 8.831553 9.321697 6.013461 8.894573 9.963420
d 9.564380 9.376607 8.886603 5.608460 7.276372 6.066041


Please advise, how to achieve this in an efficient manner?

I have achieved this eyeballing and the code below

exp.mat.trans <- exp.mat[c(1,4,5,7),]
rownames(exp.mat.trans) <- c('a','b','c','d')


It might be useful to identify just the indices as there is no transformation of the values?

# Index Subsetting
ind <- c(1,4,5,7)
exp.mat.trans2 <- exp.mat[ind,]
rownames(exp.mat.trans2) <- maps[ind, 'map']


exp.mat.trans
and
exp.mat.trans2
are same !

Answer

If you want to have an efficient solution I think it would be better to use data.tables for the mapping. Your input matrix is something different if I run it. I found the following solution for the problem:

set.seed(1)
exp.mat <- matrix(runif(9*6, 5.0, 10), nrow = 9, ncol = 6)
rownames(exp.mat) <- c('a','b1','b2','b3','c','d1','d2','e1','e2')
colnames(exp.mat) <- c('s1','s2','s3','s4','s5','s6')
> exp.mat
         s1       s2       s3       s4       s5       s6
a  6.327543 5.308931 6.900176 6.911940 8.971199 8.946781
b1 6.860619 6.029873 8.887226 9.348454 5.539718 5.116656
b2 7.864267 5.882784 9.673526 6.701745 8.618555 7.386150
b3 9.541039 8.435114 6.060713 7.410401 7.056372 8.661569
c  6.008410 6.920519 8.258369 7.997829 9.104731 8.463658
d1 9.491948 8.849207 5.627775 7.467707 8.235301 7.388098
d2 9.723376 7.488496 6.336103 5.931088 8.914664 9.306047
e1 8.303989 8.588093 6.930570 9.136867 7.765182 7.190486
e2 8.145570 9.959530 5.066952 8.342334 7.648598 6.223986
maps <- data.table(rown=c('a','b1','b2','b3','c','d1','d2','e1','e1'), 
                   map =c('a','b','b','b','c','d','d','e','f'))
#RULE 2 calculate mean of each row
maps[, value := rowMeans(exp.mat)]
# aggregate such that we know which mapping should be made (RULE 2)
maps <- maps[, rown[which.max(value)], by = map]
# Delete if more mappings are made first find the number of mappings (RULE 3)
number_map <- maps[,.N, by = V1]
setkey(maps, "V1")
# Delete if more than one time a mapping is found
maps <- maps[number_map[N < 2, V1]] 
# Now subset the matrix
exp.mat[maps$V1[maps$V1 %in% rownames(exp.mat)],]
         s1       s2       s3       s4       s5       s6
a  6.327543 5.308931 6.900176 6.911940 8.971199 8.946781
b3 9.541039 8.435114 6.060713 7.410401 7.056372 8.661569
c  6.008410 6.920519 8.258369 7.997829 9.104731 8.463658
d2 9.723376 7.488496 6.336103 5.931088 8.914664 9.306047