fugu fugu - 1 year ago 51
R Question

Perform calculation on each row of a single column from data frame

I have a data frame (


sample chrom pos ref alt tri trans decomposed_tri grouped_trans type feature gene
1 1 1 659105 G A CGT G>A ACG C>T somatic intron ds
2 1 1 1227592 A G CAC A>G GTG T>C somatic intron CG42329
3 1 1 1775341 T G CTG T>G CTG T>G somatic intergenic intergenic
4 1 1 1775552 T C GTT T>C GTT T>C somatic intergenic intergenic
5 1 1 1812639 T G GTG T>G GTG T>G somatic intergenic intergenic
6 1 1 1812641 G A GGA G>A TCC C>T somatic intergenic intergenic

And a list of genes with their lengths (

[1] 1553

[1] 8019

[1] 10010

[1] 1385

[1] 1974

[1] 1933

And I want to:

a) Calculate the number of times you would expect to see a gene in this list given the length of the gene (in
) and the length of the genome (

b) Calculate the number of times we actually see each gene

c) Calculate the a ratio of observed/expected

d) Return this as a data frame

Here's what I'm doing:

snv_count<-nrow(data) # total number of observations
hit_genes<-table(data$gene) # the number of times I find each gene in my data
cat("gene", "observed", "expected", "fc", "\n")

for (g in levels(data$gene)) {
cat(g, hit_genes[g], gene_expect, fc, "\n")

gene observed expected fc
128up 5 1.493344 3.348189
18SrRNA-Psi:CR45861 3 0.5076489 5.909596
C442219 4 0.03778505 105.862

This works. However, I'm running this in a function, and want to return a data frame, how can I build a data frame row by row in the for loop? I've tried initialising an empty data frame before the loop:

df <- data.frame(gene = character(), observed = numeric(), expected = numeric(), fc = numeric())

and then building row by row in the loop:

enriched <- rbind(df, data.frame(gene = g, observed = hit_genes[g], expected = gene_expect, fc = fc))

But I get the following error:

Error in data.frame(gene = g, observed = hit_genes[g], expected = gene_expect, :
arguments imply differing number of rows: 1, 0

A further question is - should I be using
to achieve this rather than a loop?

Answer Source

Maybe with ?lapply. (Untested.)

enriched <- lapply(levels(data$gene), fun)
enriched <- do.call(rbind, enriched)

# 'fun' returns a list with four members
fun <- function(g) {
    list(gene = g, observed = hit_genes[g], expected = gene_expect, fc = fc)

Note that this assumes that the objects referred to in functions fun is available, i.e., in the global environment.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download