M. Beausoleil M. Beausoleil - 2 months ago 14
R Question

Count the number of elements between 2 dates conditionally on a variable in R

I'm trying to count the number of precipitation below a certain threshold (let's say less or equal than 50) between two dates.

Basically, I have a vector

cuts
that contains the dates that I want to count between inclusively. I want to use the
cuts
vector to "subset" the dataset in different bins and than count the number of events where it was raining less than 50 mm of rain.

I'm using dplyr and a for loop at the moment, but nothing is working.

set.seed(12345)
df = data.frame(date = seq(as.Date("2000/03/01"), as.Date("2002/03/01"), "days"),
precipitation = rnorm(length(seq(as.Date("2000/03/01"), as.Date("2002/03/01"), "days")),80,20))
cuts = c("2001-11-25","2002-01-01","2002-02-18","2002-03-01")
for (i in 1:length(cuts)) {
df %>% summarise(count.prec = if (date > cuts[i] | date < cuts[i+1]) {count(precipitation <= 50)})
}


But I have this error message:

Error: no applicable method for 'group_by_' applied to an object of class "logical"
In addition: Warning message:
In if (c(11017, 11018, 11019, 11020, 11021, 11022, 11023, 11024, :
the condition has length > 1 and only the first element will be used


This is not working either:

for (i in 1:length(cuts)) {
df %>% if (date > cuts[i] | date < cuts[i+1])%>% summarise(count.prec = count(precipitation <= 50))
}

Answer

You could try:

df %>%
  group_by(gr = cut(date, breaks = as.Date(cuts))) %>%
  summarise(res = sum(precipitation <= 50))

Which gives:

# A tibble: 4 × 2
          gr   res
      <fctr> <int>
1 2001-11-25     1
2 2002-01-01     4
3 2002-02-18     2
4         NA    40
Comments