Kodiakflds Kodiakflds - 1 month ago 11
R Question

How to divide one column into multiple columns in R dataframe

I've looked around for an answer and have not quite come up with a solution.

I am trying to divide multiple (~60) columns of my data frame (species counts) by a single column in the data frame (unit of sample effort)

I was able to come up with the solution below- but it is messier than I would prefer. As it is written now, I could accidentally run the last line of code twice, and mess up my values by dividing twice.

Here is a brief example below where I demonstrate the solution I used. Any suggestions for something cleaner?

#short data.frame with some count data
#Hours is the sampling effort


counts=data.frame(sp1=sample(1:10,10),sp2=sample(1:10,10),
sp3=sample(1:10,10),sp4=sample(1:10,10),
Hours=rnorm(10,4,1))


#get my 'species' names
names=colnames(counts)[1:4]

#This seems messy: and if I run the second line twice, I will screw up my values. I want to divide all 'sp' columns by the single 'Hours' column

rates=counts
rates[names]=rates[,names]/rates[,'Hours']


p.s.: I've been piping with %>%, and so if anyone has a solution that I could just transform the 'count' data.frame without creating a new data.frame, that would be swell!

p.s.s I suspect one of Hadley's functions may have what I need(eg. mutate_each?), but I have not been able to figure it out..

Answer

I really don't see what is wrong with your base R approach, it is very clean. If you are worried about accidentally running the 2nd line multiple times without running the first line, just reference the original counts columns as below. I would make the tiny adjustments to do it like this:

rates = counts
rates[names] = counts[names] / counts[["Hours"]]

Using [ and [[ guarantees the data types regardless of the length of names.

I do like dplyr, but it seems messier for this:

# This works if you want everything except the Hours column
rates = counts %>% mutate_each(funs(./Hours), vars = -Hours)

# This sort of works if you want to use the names vector
rates = counts %>% mutate_at(funs(./Hours), .cols = names)