MAPK - 1 year ago 68

R Question

I have this data matrix called

`mymat`

`.GT`

`00860`

`00861`

`.AD`

`.AD`

`50,0`

`.GT`

`0/0`

`25/25`

`.GT`

`0/1`

`0,50`

`.GT`

`1/1`

`.DP`

`50`

`result`

`mymat <- structure(c("0/1", "1/1", "0/0", "0/0"), .Dim = c(2L, 2L), .Dimnames = list(`

c("chr1:1163804", "chr1:1888193"

), c("00860.GT", "00861.GT")))

result:

`00860.GT 00860.AD 00860.DP 00861.GT 00861.AD 00861.DP`

chr1:1163804 0/1 25/25 50 0/0 50,0 50

chr1:1888193 1/1 0/50 50 0/0 50,0 50

Answer Source

Here's a data.table solution, with each line commented. It is written to handle any number of columns in your `mymat`

object. I will explain briefly:

1) First, we convert to a data.table format where we can handle any number of columns, assuming it will be in a similar format.

2) We find all of the ".GT" columns and extract the number before the ".GT".

3) We create ".DP" columns for each ".GT" column found.

4) We develop a "GT" to "AD" mapping by creating a vector of the "to" part of the mapping. The "from" part is stored as names in the vector.

5) Use the .SDcols feature in the data.table to apply the "GT" to "AD" mapping, and create the "AD" columns.

```
# Your matrix
mymat <- structure(c("0/1", "1/1", "0/0", "0/0"), .Dim = c(2L, 2L),
.Dimnames = list(c("chr1:1163804", "chr1:1888193"),
c("00860.GT", "00861.GT")))
# Using a data table approach
library(data.table)
# Casting to data table - row.names will be converted to a column called 'rn'.
mymat = as.data.table(mymat, keep.rownames = T)
# Find "GT" columns
GTcols = grep("GT", colnames(mymat))
# Get number before ".GT"
selectedCols = gsub(".GT", "", colnames(mymat)[GTcols])
selectedCols
[1] "00860" "00861"
# Create ".DP" columns
mymat[, paste0(selectedCols, ".DP") := 50, with = F]
mymat
rn 00860.GT 00861.GT 00860.DP 00861.DP
1: chr1:1163804 0/1 0/0 50 50
2: chr1:1888193 1/1 0/0 50 50
# Create "GT" to "AD" mapping
GTToADMapping = c("50,0", "25/25", "0/50")
names(GTToADMapping) = c("0/0", "0/1", "1/1")
GTToADMapping
0/0 0/1 1/1
"50,0" "25/25" "0/50"
# This function will return the "AD" mapping given the values of "GT"
mapGTToAD <- function(x){
return (GTToADMapping[x])
}
# Here, we create the AD columns using the GT mapping
mymat[, (paste0(selectedCols, ".AD")) := lapply(.SD, mapGTToAD), with = F,
.SDcols = colnames(mymat)[GTcols]]
rn 00860.GT 00861.GT 00860.DP 00861.DP 00860.AD 00861.AD
1: chr1:1163804 0/1 0/0 50 50 25/25 50,0
2: chr1:1888193 1/1 0/0 50 50 0/50 50,0
# We can sort the data now as you have it
colOrder = as.vector(rbind(paste0(selectedCols, ".GT"),
paste0(selectedCols, ".AD"),
paste0(selectedCols, ".DP")))
mymat = mymat[, c("rn", colOrder), with = F]
mymat
rn 00860.GT 00860.AD 00860.DP 00861.GT 00861.AD 00861.DP
1: chr1:1163804 0/1 25/25 50 0/0 50,0 50
2: chr1:1888193 1/1 0/50 50 0/0 50,0 50
# Put it back in the format you had
mymat2 = as.matrix(mymat[,-1, with = F])
rownames(mymat2) = mymat$rn
mymat2
00860.GT 00860.AD 00860.DP 00861.GT 00861.AD 00861.DP
chr1:1163804 "0/1" "25/25" "50" "0/0" "50,0" "50"
chr1:1888193 "1/1" "0/50" "50" "0/0" "50,0" "50"
```