neversaint neversaint - 3 months ago 12
R Question

How to perform pairwise division based on row grouping

I have a data frame that is made the following way:

df <- structure(list(celltype = structure(c(1L, 1L, 2L, 2L, 3L, 3L,
4L, 4L, 5L, 5L, 6L, 6L, 7L, 7L, 8L, 8L, 9L, 9L, 10L, 10L), .Label = c("Bcells",
"DendriticCells", "Macrophages", "Monocytes", "NKCells", "Neutrophils",
"StemCells", "StromalCells", "abTcells", "gdTCells"), class = "factor"),
sample = c("SP ID control", "SP ID treated", "SP ID control",
"SP ID treated", "SP ID control", "SP ID treated", "SP ID control",
"SP ID treated", "SP ID control", "SP ID treated", "SP ID control",
"SP ID treated", "SP ID control", "SP ID treated", "SP ID control",
"SP ID treated", "SP ID control", "SP ID treated", "SP ID control",
"SP ID treated"), `mean(score)` = c(0.160953535029424, 0.155743474395545,
0.104788051104575, 0.125247035158472, -0.159665650045289,
-0.134662049979712, 0.196249441751866, 0.212256889027029,
0.0532668251890109, 0.0738264693971133, 0.151828478029596,
0.159941552142933, -0.14128323638966, -0.120556640790534,
0.196518649474078, 0.185264282171863, 0.0654641151966543,
0.0837989059507186, 0.145111577618456, 0.145448549866796)), .Names = c("celltype",
"sample", "mean(score)"), row.names = c(7L, 8L, 17L, 18L, 27L,
28L, 37L, 38L, 47L, 48L, 57L, 58L, 67L, 68L, 77L, 78L, 87L, 88L,
97L, 98L), class = "data.frame")


It looks like this:

> df
celltype sample mean(score)
7 Bcells SP ID control 0.16095354
8 Bcells SP ID treated 0.15574347
17 DendriticCells SP ID control 0.10478805
18 DendriticCells SP ID treated 0.12524704
27 Macrophages SP ID control -0.15966565
28 Macrophages SP ID treated -0.13466205
37 Monocytes SP ID control 0.19624944
38 Monocytes SP ID treated 0.21225689
47 NKCells SP ID control 0.05326683
48 NKCells SP ID treated 0.07382647
57 Neutrophils SP ID control 0.15182848
58 Neutrophils SP ID treated 0.15994155
67 StemCells SP ID control -0.14128324
68 StemCells SP ID treated -0.12055664
77 StromalCells SP ID control 0.19651865
78 StromalCells SP ID treated 0.18526428
87 abTcells SP ID control 0.06546412
88 abTcells SP ID treated 0.08379891
97 gdTCells SP ID control 0.14511158
98 gdTCells SP ID treated 0.14544855


What I want to do is to compute division of score based on
treated
and
control
sample within
cell type
grouping.

The following Excel image illustrate the example. We're after the right most column. For example in Bcells (0.155/0.161 = 0.967).

enter image description here

At the end of the day I'd like to get the df that looks like this:

celltype sample Pairwise division
Bcells SP ID treated 0.967630031
DendriticCells SP ID treated 1.195241574
Macrophages SP ID treated 0.843400255
Monocytes SP ID treated 1.081566841
NKCells SP ID treated 1.385974647
Neutrophils SP ID treated 1.053435786
StemCells SP ID treated 0.853297563
StromalCells SP ID treated 0.942731303
abTcells SP ID treated 1.280073915
gdTCells SP ID treated 1.002322158


How can I achieve that in R?

Answer

If you spread to wide form, it's pretty trivial:

library(tidyr)
library(dplyr)

df %>% spread(sample, `mean(score)`) %>% 
    mutate(pairwise_division = `SP ID treated` / `SP ID control`)

##          celltype SP ID control SP ID treated pairwise_division
## 1          Bcells    0.16095354    0.15574347         0.9676300
## 2  DendriticCells    0.10478805    0.12524704         1.1952416
## 3     Macrophages   -0.15966565   -0.13466205         0.8434003
## 4       Monocytes    0.19624944    0.21225689         1.0815668
## 5         NKCells    0.05326683    0.07382647         1.3859746
## 6     Neutrophils    0.15182848    0.15994155         1.0534358
## 7       StemCells   -0.14128324   -0.12055664         0.8532976
## 8    StromalCells    0.19651865    0.18526428         0.9427313
## 9        abTcells    0.06546412    0.08379891         1.2800739
## 10       gdTCells    0.14511158    0.14544855         1.0023222

Note that you should probably fix your column names so you don't have to use backticks so often.

To get precisely the desired result, gather back to long, filter to just treated rows, and select the desired columns:

df %>% spread(sample, `mean(score)`) %>% 
    mutate(pairwise_division = `SP ID treated` / `SP ID control`) %>% 
    gather(sample, `mean(score)`, starts_with('SP')) %>% 
    filter(sample == 'SP ID treated') %>% 
    select(celltype, sample, pairwise_division)

##          celltype        sample pairwise_division
## 1          Bcells SP ID treated         0.9676300
## 2  DendriticCells SP ID treated         1.1952416
## 3     Macrophages SP ID treated         0.8434003
## 4       Monocytes SP ID treated         1.0815668
## 5         NKCells SP ID treated         1.3859746
## 6     Neutrophils SP ID treated         1.0534358
## 7       StemCells SP ID treated         0.8532976
## 8    StromalCells SP ID treated         0.9427313
## 9        abTcells SP ID treated         1.2800739
## 10       gdTCells SP ID treated         1.0023222

Equivalent versions are possible in base and data.table, if you prefer.

Comments