Dan - 7 months ago 38
R Question

# converting daily Total Returns and Prices to annual values in R

I've data of Total Returns and Stock Prices on a daily basis for some banks, from 1997 to 2015, such that:

``````DATE         Bank1_TotalReturn Bank1_Price Bank2_TR Bank2_P  ... and so on for all other banks
01/01/1997         103.13           10.43       NA       NA
02/01/1997         104.66           11.12   153.89    23.08
03/01/1997            ...             ...      ...      ...
...and so on
for all other
days until
31/12/2015
``````

Using R, I need to convert them in "medium annual values", so that I obtain a single annual medium value both for Total returns and Prices, for each bank, obviously, in order to make a Panel dataset.

nb: data contains lots of missing values, code must consider that issue! :)

You can use `format` to extract the year from `DATE` (once converted to a `Date` class) to convert the date to the year of the date. Then using `dplyr`:

``````library(dplyr)
res <- df %>% group_by(Year=format(as.Date(DATE,format="%d/%m/%Y"),"%Y")) %>%
summarise_at(vars(-DATE), median, na.rm=TRUE)
``````

We first `group_by` the converted `Year`, and then we use `summarise_at` to summarize the `median` for each column except `DATE`. Note that we pass in the argument `na.rm=TRUE` to `median` to ignore `NA`'s.

Or using `aggregate`:

``````df\$Year <- format(as.Date(df\$DATE,format="%d/%m/%Y"),"%Y")
df <- df[,!(names(df)=="DATE")]
res <- aggregate(. ~ Year, data=df, FUN=median, na.rm=TRUE, na.action=NULL)
``````

Here, we preprocess `df` to change the `DATE` column to `Year` and we use the formula `. ~ Year` to specify aggregating all columns grouped by `Year`. Note that in `aggregate`, we specify `na.action=NULL` since the default `na.action will remove rows that have any`NA`. Instead, we pass`na.rm=TRUE`to the function`median`to handle`NA`'s.

To illustrate, I augmented your posted data:

``````df <- structure(list(DATE = c("01/01/1997", "02/01/1997", "03/01/1997",
"04/01/1997", "01/01/1998", "02/01/1998", "03/01/1998", "04/01/1998"
), Bank1_TotalReturn = c(103.13, 104.66, 105.23, NA, 113.13,
114.66, 115.23, NA), Bank1_Price = c(10.43, 11.12, 12.15, NA,
11.43, 12.12, NA, 13.15), Bank2_TR = c(NA, 153.89, 145.89, 136.89,
140.92, 153.89, 145.89, 146.89), Bank2_P = c(NA, 23.08, NA, NA,
20.9, 23.08, 25.73, 25.98)), .Names = c("DATE", "Bank1_TotalReturn",
"Bank1_Price", "Bank2_TR", "Bank2_P"), class = "data.frame", row.names = c(NA,
-8L))
##        DATE Bank1_TotalReturn Bank1_Price Bank2_TR Bank2_P
##1 01/01/1997            103.13       10.43       NA      NA
##2 02/01/1997            104.66       11.12   153.89   23.08
##3 03/01/1997            105.23       12.15   145.89      NA
##4 04/01/1997                NA          NA   136.89      NA
##5 01/01/1998            113.13       11.43   140.92   20.90
##6 02/01/1998            114.66       12.12   153.89   23.08
##7 03/01/1998            115.23          NA   145.89   25.73
##8 04/01/1998                NA       13.15   146.89   25.98
``````

With this data using either approaches, the result is:

``````print(res)
##   Year Bank1_TotalReturn Bank1_Price Bank2_TR Bank2_P
##1  1997            104.66       11.12   145.89  23.080
##2  1998            114.66       12.12   146.39  24.405
``````

If the intent is to compute the mean instead of the median, then just substitute `mean` for `median`. For the `aggregate` solution:

``````res <- aggregate(. ~ Year, data=df, FUN=mean, na.rm=TRUE, na.action=NULL)
print(res)
##  Year Bank1_TotalReturn Bank1_Price Bank2_TR Bank2_P
##1 1997            104.34    11.23333 145.5567 23.0800
##2 1998            114.34    12.23333 146.8975 23.9225
``````