Ronja Ronja - 1 year ago 62
R Question

Combining "aggregate" with a user-made function

I am trying to extract information from a dataset of climatic variables (daily values), spanning multiple years. Presence and absence of snow is coded as 1 and 0 in the data set. Each line corresponds to a day and I have a columns cody for date and half year.
For each half year period I am attempting to find the value of the date after the last occurence of snow presence (1). I tried this by grouping the variables via aggregate, but the use of a custom function did not succeed, since "aggregate" seems to pass the columns of the dataframe without a name.

df<-data.frame(Date=as.Date(dates,format = "%Y-%m-%d"),

I know that using loops is not that efficient, but since I want to avoid applying the function to the "not-plot-columns" I am using:

for(plots in names(df)[- which(names(df) %in% c("Date","halves"))]){
meltday[[plots<-aggregate(df[[plots]]~halves,df,df$Date[last(which(snow.days.half$Date==0)) + 1])

This produces a error, since the last part is not evaluated as a function.
So I tried with a home-made function to find the Date+1 of the last occurence of 1 in all plot colums.


But then trying

for(plots in names(df)[- which(names(df) %in% c("Date","halves"))]){
meltday[[plots]]<-aggregate(df[[plots]] ~ halves,df,snowmelt)

left me with an error statement, since the function coerces its input to a list.

So, I am quite stuck and will be very thankful and apprechiative for any comments or answers that point me into the right direction.
My desired output would be a dataframe with dates in it: along the lines of

halves plot1 plot2 plot3
1993-01-04 1993-01-04 1993-01-03 1993-01-05
1994-01-01 1994-02-22 1994-02-24 1994-02-23

EDIT: added desired output for clarification.


Answer Source

Using tidyr and dplyr

I changed the way you input the halves variable in the OP to use a character vector (I think that might've been a bug)

df<-data.frame(Date=as.Date(dates,format = "%Y-%m-%d"),

Then I use the gather command to put the data into a long-format, which I then turn into a grouped data-frame for which the summarise function can aggregate for each plot. I put the data back into a wide-format using spread.

df %>%
gather(plot, snow, plot1:plot3) %>%
group_by(plot, halves) %>%
arrange(Date) %>%
summarise(meltday=Date[max(which(snow==T))+1]) %>%
spread(plot, meltday)

The output is

# A tibble: 2 × 4
      halves      plot1      plot2      plot3
*     <fctr>     <date>     <date>     <date>
1 1993-01-01 1993-01-04 1993-01-03 1993-01-05
2 1994-01-01 1994-02-22 1994-02-24 1994-02-23

Note: this will have an error if the meltday hasn't occurred (i.e. there is snow in the final entry). I haven't thought of a neater way to detect the 'last' snow day yet.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download