Chris Chris - 2 months ago 12
R Question

Calculate percentage change from a base date within groups

I have a dataframe with the following structure:

set.seed(12345)
df <- data.frame(cat1 = rep(1:4, each = 6),
cat2 = rep(1:2, each = 3,4),
day = rep(as.Date(c("2016-01-01", "2016-01-02", "2016-01-03")),8),
x = sample(80:120,24),
y = sample(80:120,24))

cat1 cat2 day x y
1 1 1 01.01.2016 109 106
2 1 1 02.01.2016 115 95
3 1 1 03.01.2016 120 107
4 1 2 01.01.2016 113 100
5 1 2 02.01.2016 96 88
6 1 2 03.01.2016 85 97
7 2 1 01.01.2016 91 118
8 2 1 02.01.2016 97 80
9 2 1 03.01.2016 104 86
10 2 2 01.01.2016 111 101
11 2 2 02.01.2016 81 91
12 2 2 03.01.2016 84 90
13 3 1 01.01.2016 101 105
14 3 1 02.01.2016 80 108
15 3 1 03.01.2016 90 96
16 3 2 01.01.2016 92 83
17 3 2 02.01.2016 89 99
18 3 2 03.01.2016 112 109
19 4 1 01.01.2016 118 111
20 4 1 02.01.2016 100 115
21 4 1 03.01.2016 103 85
22 4 2 01.01.2016 86 112
23 4 2 02.01.2016 98 81
24 4 2 03.01.2016 105 113


I need to calculate an index from a fixed date within the dataset over a set of subgroups (cat1, cat2). My desired outcome when indexing on 02.01.2016 looks like this:

cat1 cat2 day x y xi yi
1 1 1 01.01.2016 109 106 0,94783 1,11579
2 1 1 02.01.2016 115 95 1,00000 1,00000
3 1 1 03.01.2016 120 107 1,04348 1,12632
4 1 2 01.01.2016 113 100 1,17708 1,13636
5 1 2 02.01.2016 96 88 1,00000 1,00000
6 1 2 03.01.2016 85 97 0,88542 1,10227
7 2 1 01.01.2016 91 118 0,93814 1,47500
8 2 1 02.01.2016 97 80 1,00000 1,00000
9 2 1 03.01.2016 104 86 1,07216 1,07500
10 2 2 01.01.2016 111 101 1,37037 1,10989
11 2 2 02.01.2016 81 91 1,00000 1,00000
12 2 2 03.01.2016 84 90 1,03704 0,98901
13 3 1 01.01.2016 101 105 1,26250 0,97222
14 3 1 02.01.2016 80 108 1,00000 1,00000
15 3 1 03.01.2016 90 96 1,12500 0,88889
16 3 2 01.01.2016 92 83 1,03371 0,83838
17 3 2 02.01.2016 89 99 1,00000 1,00000
18 3 2 03.01.2016 112 109 1,25843 1,10101
19 4 1 01.01.2016 118 111 1,18000 0,96522
20 4 1 02.01.2016 100 115 1,00000 1,00000
21 4 1 03.01.2016 103 85 1,03000 0,73913
22 4 2 01.01.2016 86 112 0,87755 1,38272
23 4 2 02.01.2016 98 81 1,00000 1,00000
24 4 2 03.01.2016 105 113 1,07143 1,39506


I tried extracting the reference dates for each group with data.table subsets and then using this extracts to calculate indexes but I haven't figured out how to do that properly.

Answer

Highly likely this has been answered before, but here are two options with dplyr and data.table:

library(dplyr)
df %>% 
  group_by(cat1, cat2) %>% 
  mutate(xi = x/x[day=='2016-01-02'], 
         yi = y/y[day=='2016-01-02'])

library(data.table)
setDT(df)[, `:=` (xi = x/x[day=='2016-01-02'], 
                  yi = y/y[day=='2016-01-02']), 
          by = .(cat1, cat2)]

which results in:

    cat1 cat2        day   x   y        xi        yi
 1:    1    1 2016-01-01 109 106 0.9478261 1.1157895
 2:    1    1 2016-01-02 115  95 1.0000000 1.0000000
 3:    1    1 2016-01-03 120 107 1.0434783 1.1263158
 4:    1    2 2016-01-01 113 100 1.1770833 1.1363636
 5:    1    2 2016-01-02  96  88 1.0000000 1.0000000
 6:    1    2 2016-01-03  85  97 0.8854167 1.1022727
 7:    2    1 2016-01-01  91 118 0.9381443 1.4750000
 8:    2    1 2016-01-02  97  80 1.0000000 1.0000000
 9:    2    1 2016-01-03 104  86 1.0721649 1.0750000
10:    2    2 2016-01-01 111 101 1.3703704 1.1098901
11:    2    2 2016-01-02  81  91 1.0000000 1.0000000
12:    2    2 2016-01-03  84  90 1.0370370 0.9890110
13:    3    1 2016-01-01 101 105 1.2625000 0.9722222
14:    3    1 2016-01-02  80 108 1.0000000 1.0000000
15:    3    1 2016-01-03  90  96 1.1250000 0.8888889
16:    3    2 2016-01-01  92  83 1.0337079 0.8383838
17:    3    2 2016-01-02  89  99 1.0000000 1.0000000
18:    3    2 2016-01-03 112 109 1.2584270 1.1010101
19:    4    1 2016-01-01 118 111 1.1800000 0.9652174
20:    4    1 2016-01-02 100 115 1.0000000 1.0000000
21:    4    1 2016-01-03 103  85 1.0300000 0.7391304
22:    4    2 2016-01-01  86 112 0.8775510 1.3827160
23:    4    2 2016-01-02  98  81 1.0000000 1.0000000
24:    4    2 2016-01-03 105 113 1.0714286 1.3950617