pretz - 7 months ago 36
R Question

# R: removing repetitions from one column and sum another

Say I have a data matrix that looks like this:

``````   X1  X2      X3
1   1 869 1956.78
2   1 869 2006.52
3   1 869 1592.80
4   1 869 6620.00
5   1 869  228.22
6   1 869  110.15
7   1 869 1350.79
8   1 869   78.36
9   1 869  118.61
10  1 869  453.99
11  1 869 5393.16
12  1 869 1641.34
13  1 869  885.80
14  1 869  352.80
15  1 869  105.64
16  2   1    0.00
17  2 313    0.00
18  2 467    0.00
19  2 495    0.00
20  2 135    0.00
21  2 769    0.00
22  2 770    0.00
23  2 771    0.00
24  1 869  375.60
25  2 869    0.00
26  2 869    0.00
27  2 869    0.00
28  2   1    0.00
29  2   1    0.00
30  2 467    0.00
``````

I want to format it as so:

``````   X1  X2       X3
1   1 869 22894.96
2   2   1    0.00
3   2 313    0.00
4   2 467    0.00
5   2 495    0.00
6   2 135    0.00
7   2 769    0.00
8   2 770    0.00
9   2 771    0.00
10  1 869  375.60
11  2 869    0.00
12  2   1    0.00
13  2 467    0.00
``````

So basically, I want to remove consecutive repetitions in
`X2`
and
`X1`
and then sum the associated values in
`X3`
.

The seems like a very complex idea to me, perhaps there is an elegant solution. I know that if X3 is not zero, then the corresponding value in X1 is "1". So I can determine where this happens by using placement:

``````placement <- grep(1, df\$X1)
``````

I'm not sure how to continue from here.

Here is a solution with `data.table`. Assuming the data.frame is named df,

``````library(data.table)
setDT(df)

df[, temp:=rleid(X1, X2)][, .(X3=sum(X3)), by=.(X1, X2, temp)][, temp := NULL][]
X1  X2       X3
1:  1 869 22894.96
2:  2   1     0.00
3:  2 313     0.00
4:  2 467     0.00
5:  2 495     0.00
6:  2 135     0.00
7:  2 769     0.00
8:  2 770     0.00
9:  2 771     0.00
10:  1 869   375.60
11:  2 869     0.00
12:  2   1     0.00
13:  2 467     0.00
``````

The multiple use of `[]`, called chaining, allows for multiple function calls in a single line of code. Further it allows you to manipulate the data.table and then call functions on this manipulated value.

• `temp:=rleid(X1, X2)` creates a temporary variable that creates an ID for X2 and X3 combinations that allows for repeated appearances of non-adjacent values.
• `.(X3=sum(X3)), by=.(X1, X2, temp)` sums X3 by the three variables.
• `temp := NULL` removes the temporary variable
• the empty `[]` at the end prints out the result.