user2117258 - 1 year ago 33

R Question

I have a distance matrix with row and columns delimited by a numeric value denoted after the first underscore (e.g., 7A_0_AAGCCTAGCGAC = 0). I would like a way to compare these values in a row versus column manner. Say, for example, I'd like to subtract the row delimiters from the column delimiters.

Input:

`7A_0_AAGCCTAGCGAC 7A_4_AAATGACTGGCC 7A_7_CATCTCGTTCTA`

7A_0_AAGCCTAGCGAC 0.00000000 0.034312102 0.04539427

7A_4_AAATGACTGGCC 0.03431210 0.000000000 0.01422137

7A_7_CATCTCGTTCTA 0.04539427 0.014221369 0.00000000

Expected output:

`7A_0_AAGCCTAGCGAC 7A_4_AAATGACTGGCC 7A_7_CATCTCGTTCTA`

7A_0_AAGCCTAGCGAC 0.00000000 -4 -7

7A_4_AAATGACTGGCC 4 0.000000000 -3

7A_7_CATCTCGTTCTA 7 3 0.00000000

Any help would be much appreciated.

Answer Source

You can extract the numeric values from the column names and row names respectively and then do an outer subtraction:

```
# extract numeric values from the dimension names of the matrix
cols = as.numeric(sub(".*_(\\d+)_.*", "\\1", colnames(mat)))
rows = as.numeric(sub(".*_(\\d+)_.*", "\\1", rownames(mat)))
# calculate an outer subtract from the two vectors
output <- outer(cols, rows, "-")
# set up the dimension name
dimnames(output) <- list(rownames(mat), colnames(mat))
output
# 7A_0_AAGCCTAGCGAC 7A_4_AAATGACTGGCC 7A_7_CATCTCGTTCTA
#7A_0_AAGCCTAGCGAC 0 -4 -7
#7A_4_AAATGACTGGCC 4 0 -3
#7A_7_CATCTCGTTCTA 7 3 0
```