MAPK - 1 year ago 90
R Question

# How to expand data matrix for corresponding column names

I have this data matrix called

`mymat`
. It has got
`.GT`
columns for samples
`00860`
and
`00861`
. I want to expand this matrix with new
`.AD`
column. The corresponding
`.AD`
columns for each sample will have values
`50,0`
if
`.GT`
is
`0/0`
,
`25/25`
if
`.GT`
is
`0/1`
and
`0,50`
if
`.GT`
is
`1/1`
. I also want to add another column called
`.DP`
next to each column which will have
`50`
across the column and get the
`result`
. How can I do this kind of conditional expansion of matrix in R?

``````mymat <- structure(c("0/1", "1/1", "0/0", "0/0"), .Dim = c(2L, 2L), .Dimnames = list(
c("chr1:1163804", "chr1:1888193"
), c("00860.GT", "00861.GT")))
``````

result:

``````           00860.GT 00860.AD 00860.DP 00861.GT 00861.AD 00861.DP
chr1:1163804 0/1      25/25       50      0/0     50,0     50
chr1:1888193 1/1      0/50        50      0/0     50,0     50
``````

Here's a data.table solution, with each line commented. It is written to handle any number of columns in your `mymat` object. I will explain briefly:

1) First, we convert to a data.table format where we can handle any number of columns, assuming it will be in a similar format.

2) We find all of the ".GT" columns and extract the number before the ".GT".

3) We create ".DP" columns for each ".GT" column found.

4) We develop a "GT" to "AD" mapping by creating a vector of the "to" part of the mapping. The "from" part is stored as names in the vector.

5) Use the .SDcols feature in the data.table to apply the "GT" to "AD" mapping, and create the "AD" columns.

``````# Your matrix
mymat <- structure(c("0/1", "1/1", "0/0", "0/0"), .Dim = c(2L, 2L),
.Dimnames = list(c("chr1:1163804", "chr1:1888193"),
c("00860.GT", "00861.GT")))

# Using a data table approach
library(data.table)

# Casting to data table - row.names will be converted to a column called 'rn'.
mymat = as.data.table(mymat, keep.rownames = T)

# Find "GT" columns
GTcols = grep("GT", colnames(mymat))

# Get number before ".GT"
selectedCols = gsub(".GT", "", colnames(mymat)[GTcols])

selectedCols
[1] "00860" "00861"

# Create ".DP" columns
mymat[, paste0(selectedCols, ".DP") := 50, with = F]

mymat
rn 00860.GT 00861.GT 00860.DP 00861.DP
1: chr1:1163804      0/1      0/0       50       50
2: chr1:1888193      1/1      0/0       50       50

# Create "GT" to "AD" mapping

0/0     0/1     1/1
"50,0" "25/25"  "0/50"

# This function will return the "AD" mapping given the values of "GT"
}

# Here, we create the AD columns using the GT mapping
.SDcols = colnames(mymat)[GTcols]]

1: chr1:1163804      0/1      0/0       50       50    25/25     50,0
2: chr1:1888193      1/1      0/0       50       50     0/50     50,0

# We can sort the data now as you have it
colOrder = as.vector(rbind(paste0(selectedCols, ".GT"),
paste0(selectedCols, ".DP")))
mymat = mymat[, c("rn", colOrder), with = F]

mymat