anthr anthr - 1 month ago 8
R Question

Create new R dataframe column based on conditions in other columns

I have a dataframe which has a date column, a column of ints (labelled

value
in the example below), and 12 other numeric columns, each corresponding to a month and labelled
X1
(jan) through
X12
(dec).

It looks something like:

date_var value X1 X2 X3 ... X12
2016-01-01 100 1212 4161 9080 ... 383
2016-02-01 150 1212 4161 9080 ... 383
2016-03-01 150 1212 4161 9080 ... 383


What I'd like to do is create a new column, lets call it Z, which corresponds to the number in the
value
column, divided by the appropriate monthly value.

For example, in the table above Z for the
2016-01-01
entry would equal 100/1212, whereas the
2016-02-01
entry would instead divide by X2 for Feb and
2016-03-01
would have
value
divided by X3:

date_var value X1 X2 X3 ... X12 Z
2016-01-01 100 1212 4161 9080 ... 383 0.0825
2016-02-01 150 1212 4161 9080 ... 383 0.0360
2016-03-01 150 1212 4161 9080 ... 383 0.0165


I've tried various approaches along the lines of attempting to divide
value
by
df[paste("X", month(df$date_var), sep = '')]
, although this returned list a rather than working element-wise so obviously isn't the correct approach.

Answer

Another good way using the dplyr and tidyr packages basically takes the R approach of converting your information to long data frame format (i.e. the same kind of information in the same column, here all your X1-X12) and then uses a filter condition to only consider the month values that match the month in your date variable:

library(dplyr)
library(tidyr)
library(lubridate)

# test data frame (code from parksw3)
data <- data_frame(
  date_var = as.Date(c("2016-01-01", "2016-02-01", "2016-03-01")),
  value = c(100, 150, 150),
  X1 = rep(1212, 3),
  X2 = rep(4161, 3),
  X3 = rep(9080, 3),
  X12 = rep(383, 3)
) 

# calculate the resulting Z column
result <- data %>% 
  # gather all the month (X1-X12) values into long format 
  # with month_var and month_value as key/value pair
  gather(month_var, month_value, starts_with("X")) %>% 
  # only consider the month_value for the month_var that matches the date's month
  filter(month_var == paste0("X", month(date_var))) %>% 
  # calculate the derived quantity
  mutate(Z = value/month_value)

print(result)

##     date_var value month_var month_value          Z
##       <date> <dbl>     <chr>       <dbl>      <dbl>
## 1 2016-01-01   100        X1        1212 0.08250825
## 2 2016-02-01   150        X2        4161 0.03604903
## 3 2016-03-01   150        X3        9080 0.01651982

If you want, you can merge it back into your original data frame:

data_all <- left_join(data, select(result, date_var, Z), by = "date_var")

print(data_all)

##     date_var value    X1    X2    X3   X12          Z
##       <date> <dbl> <dbl> <dbl> <dbl> <dbl>      <dbl>
## 1 2016-01-01   100  1212  4161  9080   383 0.08250825
## 2 2016-02-01   150  1212  4161  9080   383 0.03604903
## 3 2016-03-01   150  1212  4161  9080   383 0.01651982