Andrew - 1 year ago 63
R Question

mean centering based on conditions

I am quite new to R and have a small issue. I have large time-stamped data (dataframe called 'EURO') for a number years where I have calculated returns and the 'Time' variable (a sample is below);

``````Time    Returns
9       5
10      5
11      8
12      7
13      7
14      2
15      -4
16      6
17      8
9       5
10      7
11      4
12      6
13      1
14      0
15      -4
16      7
17      -4
9       8
10      1
11      1
12      6
13      8
14      8
15      4
16      -6
17      7
``````

I have used the following code to work out the mean return of each individual timestamp using;

``````Timeaverages <- aggregate(Returns~Time,EURO,mean)
``````

From that I have the average returns for each timestamp. However now I want to mean center my data, that is, minus the average return for each timestamp. For instance, if the average return for the time 9 is 6, so the meancentred return will be 5-6 = -1. For the time 10, the average return is 5, so the meancentred return will be 2 etc.

``````Time    Returns   Meancentered
9       5             -1
10      7              2
11      7              3
12      6              0
13      6              1
14      4              0
15      -3             -2
16      5              3
17      9              5
9       5              -1
10      7              2
11      4              0
12      6              0
13      1              -4
14      0              -4
15      -4             -3
16      7              5
17      -4             -8
9       8              2
10      1              -4
11      1              -3
12      6              0
13      8              3
14      8              4
15      4              5
16      -6             -8
17      7              3
``````

Any help much appreciated

If I've understood correctly you're looking for `merge()`.

``````merge.df <- merge(EURO, Timeaverages, by="Time", suffixes = c(".ind", ".mean"))

#   Time Returns.ind Returns.mean
#1     9           5     6.000000
#2     9           5     6.000000
#3     9           8     6.000000
#4    10           1     4.333333
#5    10           5     4.333333
# etc
``````

Now just get the difference between the two returns columns and `cbind` it to `merge.df`.

``````diff <- merge.df\$Returns.ind - merge.df\$Returns.mean
cbind(merge.df, diff)

#   Time Returns.ind Returns.mean       diff
#1     9           5     6.000000 -1.0000000
#2     9           5     6.000000 -1.0000000
#3     9           8     6.000000  2.0000000
#4    10           1     4.333333 -3.3333333
#5    10           5     4.333333  0.6666667
#6    10           7     4.333333  2.6666667
#7    11           1     4.333333 -3.3333333
``````

If you use `dplyr`, you can do the whole lot in one chain by grouping, mutating, ungrouping and mutating again.
``````library(dplyr)