pretz pretz - 1 month ago 10
R Question

R: removing repetitions from one column and sum another

Say I have a data matrix that looks like this:

X1 X2 X3
1 1 869 1956.78
2 1 869 2006.52
3 1 869 1592.80
4 1 869 6620.00
5 1 869 228.22
6 1 869 110.15
7 1 869 1350.79
8 1 869 78.36
9 1 869 118.61
10 1 869 453.99
11 1 869 5393.16
12 1 869 1641.34
13 1 869 885.80
14 1 869 352.80
15 1 869 105.64
16 2 1 0.00
17 2 313 0.00
18 2 467 0.00
19 2 495 0.00
20 2 135 0.00
21 2 769 0.00
22 2 770 0.00
23 2 771 0.00
24 1 869 375.60
25 2 869 0.00
26 2 869 0.00
27 2 869 0.00
28 2 1 0.00
29 2 1 0.00
30 2 467 0.00


I want to format it as so:

X1 X2 X3
1 1 869 22894.96
2 2 1 0.00
3 2 313 0.00
4 2 467 0.00
5 2 495 0.00
6 2 135 0.00
7 2 769 0.00
8 2 770 0.00
9 2 771 0.00
10 1 869 375.60
11 2 869 0.00
12 2 1 0.00
13 2 467 0.00


So basically, I want to remove consecutive repetitions in
X2
and
X1
and then sum the associated values in
X3
.

The seems like a very complex idea to me, perhaps there is an elegant solution. I know that if X3 is not zero, then the corresponding value in X1 is "1". So I can determine where this happens by using placement:

placement <- grep(1, df$X1)


I'm not sure how to continue from here.

lmo lmo
Answer

Here is a solution with data.table. Assuming the data.frame is named df,

library(data.table)
setDT(df)

df[, temp:=rleid(X1, X2)][, .(X3=sum(X3)), by=.(X1, X2, temp)][, temp := NULL][]
    X1  X2       X3
 1:  1 869 22894.96
 2:  2   1     0.00
 3:  2 313     0.00
 4:  2 467     0.00
 5:  2 495     0.00
 6:  2 135     0.00
 7:  2 769     0.00
 8:  2 770     0.00
 9:  2 771     0.00
10:  1 869   375.60
11:  2 869     0.00
12:  2   1     0.00
13:  2 467     0.00

The multiple use of [], called chaining, allows for multiple function calls in a single line of code. Further it allows you to manipulate the data.table and then call functions on this manipulated value.

  • temp:=rleid(X1, X2) creates a temporary variable that creates an ID for X2 and X3 combinations that allows for repeated appearances of non-adjacent values.
  • .(X3=sum(X3)), by=.(X1, X2, temp) sums X3 by the three variables.
  • temp := NULL removes the temporary variable
  • the empty [] at the end prints out the result.