user1741021 user1741021 - 25 days ago 16
R Question

How can I divide one column of a data frame through another?

I wanted to divide one column by another to get the per person time how can I do this?I couldn't find anything on how you can divide.

Here is some data that I want to use

min count2.freq
263807.0 1582
196190.5 1016
586689.0 3479


In the end I want to add a third column like this that has the number from
min / count2.freq


e.g
263808.0/1582 = 166.75

Answer

There are a plethora of ways in which this can be done. The problem is how to make R aware of the locations of the variables you wish to divide.

Assuming

d <- read.table(text = "263807.0    1582
196190.5    1016
586689.0    3479
")
names(d) <- c("min", "count2.freq")
> d
       min count2.freq
1 263807.0        1582
2 196190.5        1016
3 586689.0        3479

My preferred way

To add the desired division as a third variable I would use transform()

> d <- transform(d, new = min / count2.freq)
> d
       min count2.freq      new
1 263807.0        1582 166.7554
2 196190.5        1016 193.1009
3 586689.0        3479 168.6373

The basic R way

If doing this in a function (i.e. you are programming) then best to avoid the sugar shown above and index. In that case any of these would do what you want

## 1. via `[` and character indexes
d[, "new"] <- d[, "min"] / d[, "count2.freq"]

## 2. via `[` with numeric indices
d[, 3] <- d[, 1] / d[, 2]

## 3. via `$`
d$new <- d$min / d$count2.freq

All of these can be used at the prompt too, but which is easier to read:

d <- transform(d, new = min / count2.freq)

or

d$new <- d$min / d$count2.freq ## or any of the above examples

Hopefully you think like I do and the first version is better ;-)

The reason we don't use the syntactic sugar of tranform() et al when programming is because of how they do their evaluation (look for the named variables). At the top level (at the prompt, working interactively) transform() et al work just fine. But buried in function calls or within a call to one of the apply() family of functions they can and often do break.

Likewise, be careful using numeric indices (## 2. above); if you change the ordering of your data, you will select the wrong variables.

The preferred way if you don't need replacement

If you are just wanting to do the division (rather than insert the result back into the data frame, then use with(), which allows us to isolate the simple expression you wish to evaluate

> with(d, min / count2.freq)
[1] 166.7554 193.1009 168.6373

This is again much cleaner code than the equivalent

> d$min / d$count2.freq
[1] 166.7554 193.1009 168.6373

as it explicitly states that "using d, execute the code min / count2.freq. Your preference may be different to mine, so I have shown all options.

Comments