ayshelina ayshelina - 21 days ago 5
R Question

R - group rows in data frame while counting occurrences in one column and summing values in other

I am trying to modify my data frame:

start end duration_time
1 1 2 2.438
2 2 1 3.901
3 1 2 18.037
4 2 3 85.861
5 3 4 83.922


and create something like this:

start end duration_time weight
1 1 2 20.475 2
2 2 1 3.901 1
4 2 3 85.861 1
5 3 4 83.922 1


so the duplicate start-end combinations will be removed, the weight will raise and duration time will sum

i already have a part working i just cant get the weight to work:

ddply(df, c("start","end"), summarise, weight=? ,duration_time=sum(duration_time))


thanks for any replies

Answer

Simplest solution using data.table :

library(data.table)
setDT(df)[, .(duration_time=sum(duration_time), wt = .N) , by  =c("start", "end")]

   start end duration_time wt
1:     1   2        20.475  2
2:     2   1         3.901  1
3:     2   3        85.861  1
4:     3   4        83.922  1

Trying something using dplyr, tidyr

library(dplyr)
library(tidyr)
df1 <- df %>% unite(by_var, start,end)
df2 <- cbind(df1 %>% count(by_var), df1 %>% group_by(by_var)%>% 
    summarise( duration_time=sum(duration_time))%>%
    separate(by_var, c("start","end")))[c(3,4,5,2)]

> df2
  start end duration_time n
1     1   2        20.475 2
2     2   1         3.901 1
3     2   3        85.861 1
4     3   4        83.922 1