ka chun chung ka chun chung - 2 months ago 12
R Question

R gsub column names in all data frames within a list

I have a list of data frames with column names that I want to substitute some strings out, but I can't do it right.

list:
[1]
cpg value.TCGA.06.5415.01A value.TCGA.02.0003.01A value.TCGA.16.1062.01A
cg02726808 0.934641544 NA NA
cg04243127 0.8828403 NA NA
[2]
cpg value.TCGA.QH.A6CV.01A value.TCGA.E1.A7Z4.01A value.TCGA.E1.5303.01A
cg02726808 0.938556343 0.92163563 0.959269597
cg04243127 0.886928811 0.842963126 0.937700666
[N]
.....

Desired output:
list:
[1]
cpg 06.5415.01A 02.0003.01A 16.1062.01A
cg02726808 0.934641544 NA NA
cg04243127 0.8828403 NA NA
[2]
cpg QH.A6CV.01A E1.A7Z4.01A E1.5303.01A
cg02726808 0.938556343 0.92163563 0.959269597
cg04243127 0.886928811 0.842963126 0.937700666
[N]
.....


I tried to write the following:

lapply(lst, function(x) { gsub("value.TCGA.", "", colnames(lst[[x]]))})


Error in R Studio:

Error in llis1[[xy]] : invalid subscript type 'list'
Called from: is.data.frame(x)
Browse[1]>


And I don't understand what is it about. Thanks for your help.

Answer

We can use setNames to replace the column names with the new column names from the sub.

lapply(lst, function(x) setNames(x, sub("value.TCGA.", "", names(x))))

Or another option is substring if the 'value.TCGA.' position is the same in all the column names except the first one.

lapply(lst, function(x) setNames(x, c("cpg", substring(names(x)[-1], 12)))