BonnieM - 6 months ago 34

R Question

I have two data frames: rainfall data collected daily and nitrate concentrations of water samples collected irregularly, approximately once a month. **I would like to create a vector of values for each nitrate concentration that is the sum of the previous 5 days' rainfall.** Basically, I need to match the nitrate date with the rain date, sum the previous 5 days' rainfall, then print the sum with the nitrate data.

I think I need to either make a

`function`

`for`

`tapply`

`aggregate`

Here are 2 example data frames:

`# rainfall df`

mm<- c(0,0,0,0,5, 0,0,2,0,0, 10,0,0,0,0)

date<- c(1:15)

rain <- data.frame(cbind(mm, date))

# b/c sums of rainfall depend on correct chronological order, make sure the data are in order by date.

rain[ do.call(order, list(rain$date)),]

# nitrate df

nconc <- c(15, 12, 14, 20, 8.5) # nitrate concentration

ndate<- c(6,8,11,13,14)

nitrate <- data.frame(cbind(nconc, ndate))

I would like to have a way of finding the matching rainfall date for each nitrate measurement, such as:

`match(nitrate$date[i] %in% rain$date)`

(Note: Will

`match`

`as.Date`

`sum(rain$mm[j-6:j-1]`

And prints the sum in a new column in nitrate

`print(nitrate$mm_sum[i])`

To make sure it's clear what result I'm looking for, here's how to do the calculation 'by hand'. The first nitrate concentration was collected on day 6, so the sum of rainfall on days 1-5 is 5mm.

Many thanks in advance.

Answer

You were more or less there!

```
nitrate$prev_five_rainfall = NA
for (i in 1:length(nitrate$ndate)) {
day = nitrate$ndate[i]
nitrate$prev_five_rainfall[i] = sum(rain$mm[(day-6):(day-1)])
}
```

Step by step explanation:

*Initialize empty result column:*

```
nitrate$prev_five_rainfall = NA
```

*For each line in the nitrate df: (i = 1,2,3,4,5)*

```
for (i in 1:length(nitrate$ndate)) {
```

*Grab the day we want final result for:*

```
day = nitrate$ndate[i]
```

*Take the rainfull sum and it put in in the results column*

```
nitrate$prev_five_rainfall[i] = sum(rain$mm[(day-6):(day-1)])
```

*Close the for loop :)*

```
}
```

Disclaimer: This answer is basic in that:

- It will break if nitrate's ndate < 6
- It will be incorrect if some dates are missing in the rain dataframe
- It will be slow on larger data

As you get more experience with R, you might use data manipulation packages like `dplyr`

or `data.table`

for these types of manipulations.