sdhaoui sdhaoui - 1 month ago 8
R Question

How to get the row sum of a matrix containg characters with corresponding values from another vector

Hi I have this column extracted from my data:

x <- data.frame(Category=factor(c("xxyyxyxyx", "xxyyyyxyx", "xxyyxyxyy",
"yxyyxyxyx", "xxyyxyyyx")))
> x
Category
1 xxyyxyxyx
2 xxyyyyxyx
3 xxyyxyxyy
4 yxyyxyxyx
5 xxyyxyyyx


I have to calculate the corresponding row sum resulted from each three charaters in each row so I generate this matrix:

xx <- t(apply(x, 1, function(x){strsplit(gsub("([[:alnum:]]{3})", "\\1 ", x), " ")[[1]]}))

> xx

[,1] [,2] [,3]
[1,] "xxy" "yxy" "xyx"
[2,] "xxy" "yyy" "xyx"
[3,] "xxy" "yxy" "xyy"
[4,] "yxy" "yxy" "xyx"
[5,] "xxy" "yxy" "yyx"


each
xx
cell corresponds to a value given in this vector

matval=c("xxy"=3, "yxy"=2, "xyx"=7, "xyy"=5, "yyx"=12, "yyy"= 4)


I would like based on the matrix
xx
to add in the matrix
x
a column containing the sum of each row i.e.,

x

Category RowSum
1 xxyyxyxyx 12
2 xxyyyyxyx 14
3 xxyyxyxyy 10
4 yxyyxyxyx 11
5 xxyyxyyyx 17


Many thanks in advance!

Answer

1) matval[xx] will give the individual values which can then be shaped back into a matrix and summed:

transform(x, RowSum = rowSums(array(matval[xx], dim(xx))))

giving:

   Category RowSum
1 xxyyxyxyx     12
2 xxyyyyxyx     14
3 xxyyxyxyy     10
4 yxyyxyxyx     11
5 xxyyxyyyx     17

2) An alternative which computes the result directly from x without computing xx first is the following. It extracts each three characters and applies matval[...] to each such extract and then sums the resulting matrix.

library(gsubfn)

transform(x, RowSums = 
   colSums(strapply(paste(Category), "...", s ~ matval[s], simplify = TRUE)))

Note: Another way to compute xx is to insert a space after every third character, read it into a data frame and convert that to a matrix.

as.matrix(read.table(text = gsub("(...)", "\\1 ", x$Category)))