dward4 - 1 year ago 50
R Question

# Creating new data frame, each element is a selected sum of elements in 2nd data frame where 2 column elements match, sum is from third column

I have a dataframe, test, that looks like

``````    c1   c2    c3
1   98   0     2013-08
2   231  0     2011-01
3   231  2.68  2011-03
4   231  1     2011-01
... ...  ...   ...
``````

That continues on for many more rows. Column c1 has values from 1-297, while c3 has year-month values that consecutively move from 2011-01 to 2015-01. There are multiple rows that have the same c1 and c3 values.

I want to sum up each instance of c1 at each time step (so for all rows where c1 = x and c3 = y, sum those elements and get a result) and output that to a new data frame where each row represents 1 of the types from c1 (1-297), and each column is the corresponding year-month.

I am attempting to use acast (based off a suggestion) to transform it to a data frame with the rows based off of c1 values, with columns from c3, so it looks like

``````    2011-01   2011-02  2011-03 ...
1   0         1.5      2.3     ...
2   0         3.4      0       ...
3   5         2.2      1.1     ...
4   4         2.2      4.4     ...
... ...  ...   ...
``````

I have been attempting to transform this via acast:

``````acast(test, test\$c3 ~ test\$c1, value.var = "c2")
``````

But end up with a matrix/data frame of type int. The rows and columns are correct (1-297, 2011-01 - 2015-01), however the values inside the cells are wrong.

Again just to clarify, in the new data frame each element would represent the sum of elements in the first data frame for all elements that share the same c1 and c3 values.

I believe the issue is that acast sees matching combinations and does something that I don't want it to do. How would I solve this problem? I do not need acast if another solution presents itself.

You should use tidyverse packages `dplyr` and `tidyr`:

``````library(dplyr)
library(tidyr)

df <- test %>%
group_by(c1, c3) %>%
summarise(total = sum(c2)) %>%
``````

### Example

I used your simple data frame as an example:

``````#>   c1   c2      c3
#> 1  98 0.00 2013-08
#> 2 231 0.00 2011-01
#> 3 231 2.68 2011-03
#> 4 231 1.00 2011-01
``````

And after running the code, `df` looks like this:

``````#>      c1 `2011-01` `2011-03` `2013-08`
#> 1    98        NA        NA         0
#> 2   231         1      2.68        NA
``````

### Explanation

• `group_by(c1, c3)` groups the variables `c1` and `c3` in your data frame
• `summarise(total = sum(c2))` sums up `c2` (taking into account the `c1`, `c3` groupings)
• `spread(c3, total)` transforms the data frame into a "wide" format with the `c3` variables going across the columns
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download