Andrew Andrew - 1 month ago 8
R Question

mean centering based on conditions

I am quite new to R and have a small issue. I have large time-stamped data (dataframe called 'EURO') for a number years where I have calculated returns and the 'Time' variable (a sample is below);

Time Returns
9 5
10 5
11 8
12 7
13 7
14 2
15 -4
16 6
17 8
9 5
10 7
11 4
12 6
13 1
14 0
15 -4
16 7
17 -4
9 8
10 1
11 1
12 6
13 8
14 8
15 4
16 -6
17 7


I have used the following code to work out the mean return of each individual timestamp using;

Timeaverages <- aggregate(Returns~Time,EURO,mean)


From that I have the average returns for each timestamp. However now I want to mean center my data, that is, minus the average return for each timestamp. For instance, if the average return for the time 9 is 6, so the meancentred return will be 5-6 = -1. For the time 10, the average return is 5, so the meancentred return will be 2 etc.

Time Returns Meancentered
9 5 -1
10 7 2
11 7 3
12 6 0
13 6 1
14 4 0
15 -3 -2
16 5 3
17 9 5
9 5 -1
10 7 2
11 4 0
12 6 0
13 1 -4
14 0 -4
15 -4 -3
16 7 5
17 -4 -8
9 8 2
10 1 -4
11 1 -3
12 6 0
13 8 3
14 8 4
15 4 5
16 -6 -8
17 7 3


Any help much appreciated

Joe Joe
Answer

If I've understood correctly you're looking for merge().

merge.df <- merge(EURO, Timeaverages, by="Time", suffixes = c(".ind", ".mean"))

#   Time Returns.ind Returns.mean
#1     9           5     6.000000
#2     9           5     6.000000
#3     9           8     6.000000
#4    10           1     4.333333
#5    10           5     4.333333
# etc

Now just get the difference between the two returns columns and cbind it to merge.df.

diff <- merge.df$Returns.ind - merge.df$Returns.mean
cbind(merge.df, diff)

#   Time Returns.ind Returns.mean       diff
#1     9           5     6.000000 -1.0000000
#2     9           5     6.000000 -1.0000000
#3     9           8     6.000000  2.0000000
#4    10           1     4.333333 -3.3333333
#5    10           5     4.333333  0.6666667
#6    10           7     4.333333  2.6666667
#7    11           1     4.333333 -3.3333333

Addendum

If you use dplyr, you can do the whole lot in one chain by grouping, mutating, ungrouping and mutating again.

library(dplyr)
EURO %>% group_by(Time) %>% mutate(Returns.ind = mean(Returns)) %>% 
         ungroup() %>% mutate(diff = Returns - Returns.ind)

No intermediate objects needed.