st19297 st19297 - 26 days ago 4
R Question

How to mix daily and quarterly data in R?

I have quarterly fundamental data from Compustat that looks like this:

fund<-data.frame(quarterlydate=as.Date(c("03/31/1966","06/30/1966"), '%m/%d/%Y'),
gvkey=c(1000,1000,1001,1001), tic=c("XTL", "XTL", "ABL","ABL"),
sales=c(70,75,20,22))

> fund
quarterlydate gvkey tic sales
1 1966-03-31 1000 XTL 70
2 1966-06-30 1000 XTL 75
3 1966-03-31 1001 ABL 20
4 1966-06-30 1001 ABL 22


I also have daily price data from CRSP that looks like this:

prices<-data.frame(dailydate=seq(as.Date("1966/01/01"), as.Date("1966/06/30"), "days"), gvkey=c(rep(1000, 181),rep(1001, 181)),
tic=c(rep("XTL",181), rep("ABL",181)),
price=floor(runif(length(seq(as.Date("1966/01/01"), as.Date("1966/06/30"), "days")), min=0, max=50)))

> head(prices)
dailydate gvkey tic price
1 1966-01-01 1000 XTL 44
2 1966-01-02 1000 XTL 42
3 1966-01-03 1000 XTL 42
4 1966-01-04 1000 XTL 16
5 1966-01-05 1000 XTL 27
6 1966-01-06 1000 XTL 36

> tail(prices)
dailydate gvkey tic price
357 1966-06-25 1001 ABL 0
358 1966-06-26 1001 ABL 28
359 1966-06-27 1001 ABL 4
360 1966-06-28 1001 ABL 18
361 1966-06-29 1001 ABL 49
362 1966-06-30 1001 ABL 4


Question:

1) How can I merge such a quarterly and daily datasets to have a dataframe like the one below?

2) How can I calculate the average quarterly price and assign the values to the quarters? (the "average_quarterly_price" variable in the table below)

I want a merged data frame like this:

dailydate quarterlydates gvkey tic price sales average_quarterly_price
1 1966-01-01 1966-03-31 1000 XTL 1 70 32
2 1966-01-02 1966-03-31 1000 XTL 10 70 32
3 1966-01-03 1966-03-31 1000 XTL 14 70 32
4 1966-01-04 1966-03-31 1000 XTL 29 70 32
5 1966-01-05 1966-03-31 1000 XTL 1 70 32
6 1966-01-06 1966-03-31 1000 XTL 43 70 32
.
.
.
182 1966-04-01 1966-06-31 1000 XTL 11 75 41
183 1966-04-02 1966-06-31 1000 XTL 8 75 41
184 1966-04-03 1966-06-31 1000 XTL 16 75 41
185 1966-04-04 1966-06-31 1000 XTL 14 75 41
186 1966-04-05 1966-06-31 1000 XTL 14 75 41
187 1966-04-06 1966-06-31 1000 XTL 20 75 41
.
.
.
364 1966-01-01 1966-03-31 1001 ABL 18 20 15
365 1966-01-02 1966-03-31 1001 ABL 10 20 15
366 1966-01-03 1966-03-31 1001 ABL 13 20 15
367 1966-01-04 1966-03-31 1001 ABL 13 20 15
368 1966-01-05 1966-03-31 1001 ABL 11 20 15
369 1966-01-06 1966-03-31 1001 ABL 13 20 15
.
.
.
545 1966-04-01 1966-06-31 1001 ABL 14 22 16
555 1966-04-02 1966-06-31 1001 ABL 21 22 16
556 1966-04-03 1966-06-31 1001 ABL 18 22 16
557 1966-04-04 1966-06-31 1001 ABL 18 22 16
558 1966-04-05 1966-06-31 1001 ABL 17 22 16
559 1966-04-06 1966-06-31 1001 ABL 18 22 16
.
.
.
724 1966-06-31 1966-06-31 1001 ABL 22 22 16


Of course I am not sure if this is the best dataset format, and would appreciate suggestions. My ultimate purpose is to be able to use both daily and quarterly data in a single analysis. For the sake of example, I want to be able to to find stocks with quarterly Return on Assets in the top 20% percentile AND whose daily prices have been lowest in the last 10 days.

Answer

Create a "yearqtr" class column in each data frame and then perform a left join of the two data frames using common column names. Finally use ave to calculate the mean.

library(zoo) # yearqtr class

fundq <- transform(fund, yearqtr = as.yearqtr(quarterlydate))
pricesq <- transform(prices, yearqtr = as.yearqtr(dailydate))
m <- merge(pricesq, fundq, all.x = TRUE)
transform(m, avg_price = ave(price, tic, yearqtr))