Brian Brian - 5 months ago 21
R Question

Read BAM with tags to tbl / recursive dplyr::bind_cols for list of lists

I would like to read in a BAM file including some tags, then convert this to a

tibble
for further processing.

Generally, this can be achieved pretty simply:

library(Rsamtools)
library(tidyverse)
map.info <- c("rname", "strand", "pos")
map.params <- ScanBamParam(what = map.info)
bam <- scanBam(bam.file, param = map.params)


scanBam
returns a list with named vectors
rname
,
strand
and
pos
, which can be easily joined using
dplyr::bind_cols(bam)
. However, say I am interested in the
MD
-tag, I need to do the following:

map.params <- ScanBamParam(what = map.info, tag = c("MD"))
bam <- scanBam(bam.file, param = map.params)


Now,
bam
is a list of lists, with named vectors
rname
,
strand
and
pos
as preivously, but also another
tag
which is itself a list, with one named vector
MD
.

dplyr::bind_cols
can not handle this nested list of lists, and throws an error, but
as.data.frame(bam)
works.

TL;DR



A toy example boiling down to the question at heart:

> df.list <- list(a = 1:2, b = 3:4, c = 5:6)
> df.nest <- list(a = 1:2, b = 3:4, c = 5:6, d = list( e = 7:8 ))
> dplyr::bind_cols(df.list)
# A tibble: 2 x 3
a b c
<int> <int> <int>
1 1 3 5
2 2 4 6
> dplyr::bind_cols(df.nest)
Error in cbind_all(x) : Argument 4 must be length 2, not 1
> as.data.frame(df.nest)
a b c e
1 1 3 5 7
2 2 4 6 8


Is there a way to recursively
bind_cols
in a nested list?

Tentative answer



Inspired by @mt1022's answer, and without further inspection of
Rsamtools
base code, it seems that despite being formatted very similarly to the toy example,
scanBam
output does not behave like the toy example.

However, as we know what we're putting in, the following should also achieve a full merged
tibble
:

map.info <- c("rname", "strand", "pos")
map.params <- ScanBamParam(what = map.info, tag = c("MD", "NM"))
bam <- scanBam(bam.file, param = map.params)
bam.tbl <- bind_cols(do.call(bind_cols, bam[[1]][c("rname", "strand", "pos")]),
do.call(bind_cols, bam[[1]]$tag))


It seems more hacky than what I had hoped for (or expected), but it works.

Answer Source

Inspired by this example in manual of bind_cols:

# You can mix vectors and data frames:
bind_rows(
  c(a = 1, b = 2),
  data_frame(a = 3:4, b = 5:6),
  c(a = 7, b = 8)
)

I think we can try:

do.call(bind_cols, df.nest)
# # A tibble: 2 x 4
#       a     b     c     e
#   <int> <int> <int> <int>
# 1     1     3     5     7
# 2     2     4     6     8