user5727 user5727 - 3 months ago 15
R Question

Using aggregate to sum values greater than 70 in R

I am trying to sum values that are greater than 70 in several different data sets. I believe that aggregate can do this but my research has not pointed to an obvious solution to obtaining the values that exceed seventy in my data sets. I have first used aggregate to get the daily max values and put these values into the data frame called yearmaxs. Here is my code and what I have tried:

number of times O3 >70 in a year per site



Sys.setenv(TZ = "UTC")
library(openair)
library(lubridate)
filedir <- "C:/Users/dfmcg/Documents/Thesisfiles/8hravg"
myfiles <- c(list.files(path = filedir))
paste(filedir, myfiles, sep = '/')
npsfiles <- c(paste(filedir, myfiles,sep = '/'))

for (i in npsfiles[22]) {

x <- substr(i,45,61)
y <- paste('C:/Users/dfmcg/Documents/Thesisfiles/exceedenceall', x, sep='/')
timeozone <- import(i, date="DATES", date.format = "%Y-%m-%d %H", header=TRUE, na.strings="NA")

overseventy <- c()
yearmaxs <- aggregate(rolling.O3new ~ format(as.Date(date)), timeozone, max)
colnames(yearmaxs) <- c("date", "daymax")
overseventy <- aggregate(daymax ~ format(as.Date(date)), yearmaxs, sum(yearmaxs$daymax > "70"))


I have also tried: sum > "70 and sum(daymax > "70).

My other idea at this point is using a for loop to iterate through the values. I was hoping that a could use aggregate again to sum the values of interest. Any help at all would be greatly appreciated!

Using FUN = length I get this:


2004-05-27, 1

2004-05-30, 1

2004-05-31, 1

2004-06-01, 1

2004-06-02, 1


But I would like: 2004, 5

Answer

I think you want:

aggregate(daymax ~ format(as.Date(date)), yearmaxs, FUN = length,
          subset = as.numeric(daymax) > 70)

To things:

  1. you need numerical comparison, so use as.numeric(daymax) > 70 not daymax > "70";
  2. use the subset argument in aggregate.formula.
Comments