BonnieM BonnieM - 28 days ago 12
R Question

How to match dates in 2 data frames in R, then sum specific range of values up to that date?

I have two data frames: rainfall data collected daily and nitrate concentrations of water samples collected irregularly, approximately once a month. I would like to create a vector of values for each nitrate concentration that is the sum of the previous 5 days' rainfall. Basically, I need to match the nitrate date with the rain date, sum the previous 5 days' rainfall, then print the sum with the nitrate data.

I think I need to either make a

function
, a
for
loop, or use
tapply
to do this, but I don't know how. I'm not an expert at any of those, though I've used them in simple cases. I've searched for similar posts, but none get at this exactly. This one deals with summing by factor groups. This one deals with summing each possible pair of rows. This one deals with summing by
aggregate
.

Here are 2 example data frames:

# rainfall df
mm<- c(0,0,0,0,5, 0,0,2,0,0, 10,0,0,0,0)
date<- c(1:15)
rain <- data.frame(cbind(mm, date))
# b/c sums of rainfall depend on correct chronological order, make sure the data are in order by date.
rain[ do.call(order, list(rain$date)),]

# nitrate df
nconc <- c(15, 12, 14, 20, 8.5) # nitrate concentration
ndate<- c(6,8,11,13,14)
nitrate <- data.frame(cbind(nconc, ndate))


I would like to have a way of finding the matching rainfall date for each nitrate measurement, such as:

match(nitrate$date[i] %in% rain$date)


(Note: Will
match
work with
as.Date
dates?) And then sum the preceding 5 days' rainfall (not including the measurement date), such as:

sum(rain$mm[j-6:j-1]


And prints the sum in a new column in nitrate

print(nitrate$mm_sum[i])


To make sure it's clear what result I'm looking for, here's how to do the calculation 'by hand'. The first nitrate concentration was collected on day 6, so the sum of rainfall on days 1-5 is 5mm.

Many thanks in advance.

Answer

You were more or less there!

nitrate$prev_five_rainfall = NA 
for (i in 1:length(nitrate$ndate)) { 
    day = nitrate$ndate[i] 
    nitrate$prev_five_rainfall[i] = sum(rain$mm[(day-6):(day-1)])
}

Step by step explanation:

Initialize empty result column:

nitrate$prev_five_rainfall = NA 

For each line in the nitrate df: (i = 1,2,3,4,5)

for (i in 1:length(nitrate$ndate)) { 

Grab the day we want final result for:

    day = nitrate$ndate[i] 

Take the rainfull sum and it put in in the results column

    nitrate$prev_five_rainfall[i] = sum(rain$mm[(day-6):(day-1)])

Close the for loop :)

}

Disclaimer: This answer is basic in that:

  • It will break if nitrate's ndate < 6
  • It will be incorrect if some dates are missing in the rain dataframe
  • It will be slow on larger data

As you get more experience with R, you might use data manipulation packages like dplyr or data.table for these types of manipulations.