ecolog ecolog - 1 month ago 15
R Question

r - lapply divides a column by an integer value from different dataset, unexpected result

I have two data.frames, one with genotype counts and one with a number that I need to normalize my counts from the first dataset.

countsdata=data.frame(genotype1=rep(c(10,20,30,40),each=1),
genotype2=rep(c(100,200,300,400),each=1),
genotype3=rep(c(40,50,60,70),each=1),
genotype4=rep(c(40,50,60,70),each=1)
)
coldata = data.frame(Group =c('genotype1', 'genotype2', 'genotype3', 'genotype4'),
Treatment = rep(c("control","treated"),each = 2),
Norm=rep(c(1,2,5,5)))


I made sure my variables don't have factors

factorsCharacter <- function(d) modifyList(d, lapply(d[, sapply(d, is.factor)],
as.character))
coldata=factorsCharacter(coldata)


Then I see that lapply loops through my counts, one column at the time and through my coldata that contains the normalization value (Norm). All is looking good, until I combined the two action in the same step

> lapply(coldata['Group'],function(group_i){group_i})
$Group
[1] "genotype1" "genotype2" "genotype3" "genotype4"

> lapply(coldata['Group'],function(group_i){countsdata[,group_i]})
$Group
genotype1 genotype2 genotype3 genotype4
1 10 100 40 40
2 20 200 50 50
3 30 300 60 60
4 40 400 70 70

> lapply(coldata['Group'],function(group_i){as.integer(coldata[coldata$Group==group_i,'Norm'])})
$Group
[1] 1 2 5 5

> lapply(coldata['Group'],function(group_i){
+ countsdata[,group_i]/as.integer(coldata[coldata$Group==group_i,'Norm'])
+ })
$Group
genotype1 genotype2 genotype3 genotype4
1 10 100 40 40
2 10 100 25 25
3 6 60 12 12
4 8 80 14 14


Here the result is not what I was expecting (dividing each column by its normalization number). After further inspection I noticed it's normalizing by rows, in other words it's normalizing across different columns, which shouldn't be the case as I am looping through one column at the time. I am probably missing a basic concept but looking through other SO posts didn't find anything I could use. My goal is to fix the code to make the right calculation but I also would like to understand why this code above is not working. Thanks so much.

Answer

The problem is in using [ and not [[. So, instead of looping through each of the elements in 'Group' column, we have a list of length 1 with all the elements. So, either use coldata[, 'Group'] or coldata[['Group']] or coldata$Group for looping.

countsdataNew <- countsdata
countsdataNew[] <- lapply(coldata[['Group']],function(group_i)
                   countsdata[,group_i]/coldata$Norm[coldata$Group==group_i])
countsdataNew
#  genotype1 genotype2 genotype3 genotype4
#1        10        50         8         8
#2        20       100        10        10
#3        30       150        12        12
#4        40       200        14        14

If the column name in 'countsdata' and 'Group' column from 'countsdata' are in the same order, we can do this easily with Map

Map(`/`, countsdata, coldata$Norm)

Or just replicate the 'Norm' and do a simple division

countsdata/coldata$Norm[col(countsdata)]

Or with sweep

sweep(countsdata, 2, coldata$Norm, "/")
Comments