no_one no_one - 1 month ago 9
R Question

find max value in a data.frame by group and show its date as year-month-day

Here my dataframe:

df = read.csv(text = '"Date","Value","ID","WY"
1975-02-01,-1.16543693088,"Tweed",1975
1975-03-01,-1.05372283483,"Tweed",1975
1975-04-01,-1.06632370439,"Tweed",1975
1975-05-01,-1.18903485356,"Tweed",1975
1992-05-01,-1.04737467143,"Ouse",1992
1992-06-01,-1.4058281451,"Ouse",1992
1992-07-01,-1.13608647243,"Ouse",1992
1992-08-01,-0.802566581309,"Ouse",1992
1992-09-01,-0.551433852821,"Ouse",1992
1992-10-01,-0.625997598552,"Ouse",1993
1992-11-01,-0.483559758609,"Ouse",1993
1992-12-01,-0.792013395632,"Ouse",1993
1993-01-01,-0.754618121962,"Ouse",1993
1993-02-01,-1.2504282139,"Ouse",1993
1996-01-01,-0.945410385985,"Trent",1996
1996-02-01,-0.84249575782,"Trent",1996
1996-03-01,-1.10332425045,"Trent",1996
1996-04-01,-1.22634133042,"Trent",1996
1996-05-01,-1.2335181635,"Trent",1996
1996-06-01,-1.23451130358,"Trent",1996
1996-07-01,-1.25902677738,"Trent",1996
1996-08-01,-1.13068733413,"Trent",1996', header = TRUE)


I need to find the annual maximum value for each ID and WY group.

The following code do the trick very easily but its output only shows the year of each annual maximum whereas I am interested also in the relative month and day:

df_AMAX = aggregate(df$Value, by = list(df$WY, df$ID), max)
colnames(df_AMAX) = c('Date', 'ID', 'Value')
print(df_AMAX)

Date ID Value
1 1992 Ouse -0.5514339
2 1993 Ouse -0.4835598
3 1996 Trent -0.8424958
4 1975 Tweed -1.0537228


My output should be:

Date ID Value
1 1992-09-01 Ouse -0.5514339
2 1993-11-01 Ouse -0.4835598
3 1996-02-01 Trent -0.8424958
4 1975-03-01 Tweed -1.0537228


It should be a silly thing but please let me know if you have any suggestion.
Thanks

Answer

Use subset with ave. Note that the function passed to ave returns a logical but ave will coerce it to the class of Value so we use !! to make it logical again. No packages are used.

mx_all <- function(x) if (length(x)) x == max(x)
subset(df, !!ave(Value, ID, WY, FUN = mx_all))

or

mx_first <- function(x) if (length(x)) seq_along(x) == which.max(x)
subset(df, !!ave(Value, ID, WY, FUN = mx_first))

These give the same answer for the sample input and will always give the same answer if there is a unique maximum in each group but if there are multiple maxima in a group then the first one gives all of them and the second gives the first.

Comments