sdhaoui - 3 months ago 11

R Question

Hi I have this column extracted from my data:

`x <- data.frame(Category=factor(c("xxyyxyxyx", "xxyyyyxyx", "xxyyxyxyy",`

"yxyyxyxyx", "xxyyxyyyx")))

> x

Category

1 xxyyxyxyx

2 xxyyyyxyx

3 xxyyxyxyy

4 yxyyxyxyx

5 xxyyxyyyx

I have to calculate the corresponding row sum resulted from each three charaters in each row so I generate this matrix:

`xx <- t(apply(x, 1, function(x){strsplit(gsub("([[:alnum:]]{3})", "\\1 ", x), " ")[[1]]}))`

> xx

[,1] [,2] [,3]

[1,] "xxy" "yxy" "xyx"

[2,] "xxy" "yyy" "xyx"

[3,] "xxy" "yxy" "xyy"

[4,] "yxy" "yxy" "xyx"

[5,] "xxy" "yxy" "yyx"

each

`xx`

`matval=c("xxy"=3, "yxy"=2, "xyx"=7, "xyy"=5, "yyx"=12, "yyy"= 4)`

I would like based on the matrix

`xx`

`x`

`x`

Category RowSum

1 xxyyxyxyx 12

2 xxyyyyxyx 14

3 xxyyxyxyy 10

4 yxyyxyxyx 11

5 xxyyxyyyx 17

Many thanks in advance!

Answer

**1)** `matval[xx]`

will give the individual values which can then be shaped back into a matrix and summed:

```
transform(x, RowSum = rowSums(array(matval[xx], dim(xx))))
```

giving:

```
Category RowSum
1 xxyyxyxyx 12
2 xxyyyyxyx 14
3 xxyyxyxyy 10
4 yxyyxyxyx 11
5 xxyyxyyyx 17
```

**2)** An alternative which computes the result directly from `x`

without computing `xx`

first is the following. It extracts each three characters and applies `matval[...]`

to each such extract and then sums the resulting matrix.

```
library(gsubfn)
transform(x, RowSums =
colSums(strapply(paste(Category), "...", s ~ matval[s], simplify = TRUE)))
```

**Note:** Another way to compute `xx`

is to insert a space after every third character, read it into a data frame and convert that to a matrix.

```
as.matrix(read.table(text = gsub("(...)", "\\1 ", x$Category)))
```