bli bli - 1 year ago 98
R Question

Extract and paste together multiple columns of a data frame like object using a vector of column names

I have an object (variable

) which looks a bit like a "data.frame" (see further down the post for details) in that it has columns that can be accessed using

I have a vector
containing names of some of its columns (3 in example below).

I generate strings based on combinations of elements in the columns as follows:

paste(rld[[groups[1]]], rld[[groups[2]]], rld[[groups[3]]], sep="-")

I would like to generalize this so that I don't need to know how many elements are in

The following attempt fails:

> paste(rld[[groups]], collapse="-")
Error in normalizeDoubleBracketSubscript(i, x, exact = exact, error.if.nomatch = FALSE) :
attempt to extract more than one element

Here is how I would do in functional-style with a python dictionary:

map("-".join, zip(*map(rld.get, groups)))

Is there a similar column-getter operator in R ?

As suggested in the comments, here is the output of
: (I could not paste it directly, since it is huge.)

This was generated using the DESeq2 bioinformatics package, and more precisely, doing something similar to what is described page 28 of this document:

DESeq2 can be installed from bioconductor as follows:


Reproducible example

One of the solutions worked when running in interactive mode, but failed when the code was put in a library function, with the following error:

Error in paste(..., sep = "-"), colData(rld)[groups]) :
second argument must be a list

After some tests, it appears that the problem doesn't occur if the function is in the main calling script, as follows:


lib_names <- c(
file_names <- paste(

wt <- "WT"
mut <- "mut"
genotypes <- rep(c(wt, mut), times=3)
replicates <- c(rep("1", times=2), rep("2", times=2), rep("3", times=2))

sample_table = data.frame(
lib = lib_names,
file_name = file_names,
genotype = genotypes,
replicate = replicates

dds_raw <- DESeqDataSetFromHTSeqCount(
sampleTable = sample_table,
directory = ".",
design = ~ genotype

# Remove genes with too few read counts
dds <- dds_raw[ rowSums(counts(dds_raw)) > 1, ]
dds$group <- factor(dds$genotype)
design(dds) <- ~ replicate + group
dds <- DESeq(dds)

test_do_paste <- function(dds) {
groups <- head(colnames(colData(dds)), -2)
rld <- rlog(dds, blind=F)
stopifnot(all(groups %in% names(colData(rld))))
combined_names <-
function (...) paste(..., sep = "-"),

# This fails (with the same function put in a package)

The error occurs when the function is packaged as in

Data used in the example:

I will post this issue as a separate question.

Although I have an answer to my initial question, I'm still interested in alternative solutions for the "column extraction using a vector of column names" issue.

Answer Source

We may use either of the following: (...) paste(..., sep = "-"), rld[groups]), c(rld[groups], sep = "-"))

We can consider a small, reproducible example:

rld <- mtcars[1:5, ]
groups <- names(mtcars)[c(1,3,5,6,8)], c(rld[groups], sep = "-"))
#[1] "21-160-3.9-2.62-0"     "21-160-3.9-2.875-0"    "22.8-108-3.85-2.32-1" 
#[4] "21.4-258-3.08-3.215-1" "18.7-360-3.15-3.44-0"

Note, it is your responsibility to ensure all(groups %in% names(rld)) is TRUE, otherwise you get "subscript out of bound" or "undefined column selected" error.

(I am copying your comment as a follow-up)

It seems the methods you propose don't work directly on my object. However, the package I'm using provides a colData function that makes something more similar to a data.frame:

> class(colData(rld))
[1] "DataFrame"
[1] "S4Vectors" (...) paste(..., sep = "-"), colData(rld)[groups]) works, but, c(colData(rld)[groups], sep = "-")) fails with an error message I fail to understand (as too often with R...):

>, c(colData(rld)[groups], sep = "-"))
Error in (function (classes, fdef, mtable)  : 
  unable to find an inherited method for function ‘mcols’ for signature ‘"character"’
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download