Agust&#237;n Indaco - 1 year ago 77
R Question

# Calculate difference in different columns between rows by group

I have data on work stations were workers worked by day, and I need to find how many days a worker began working in the same station he left off the period day. Each observation is one work-day per worker.

`````` worker.id | start.station | end.station |  day
1      |     234       |     342     |   2015-01-02
1      |     342       |     425     |   2015-01-03
1      |     235       |     621     |   2015-01-04
2      |     155       |     732     |   2015-01-02
2      |     318       |     632     |   2015-01-03
2      |     632       |     422     |   2015-01-04
``````

So the desired outcomes would be to generate a variable (same) that identifies days in which worker started at same work station as he left off previous day (with NA or "FALSE" in first observation for each worker).

`````` worker.id | start.station | end.station |  day         |  same
1      |     234       |     342     |   2015-01-02 |  FALSE
1      |     342       |     425     |   2015-01-03 |  TRUE
1      |     235       |     621     |   2015-01-04 |  FALSE
2      |     155       |     732     |   2015-01-02 |  FALSE
2      |     318       |     632     |   2015-01-03 |  FALSE
2      |     632       |     422     |   2015-01-04 |  TRUE
``````

I think something using dplyr would work, but not sure what.

Thanks!

``````worker.id<-c(1,1,1,2,2,2)
start.station<-c(234,342,235,155,218,632)
end.station<-c(342,425,621,732,632,422)
end.station<-c(342,425,621,732,632,422)
day<-c("2015-01-02"," 2015-01-03"," 2015-01-04"," 2015-01-02"," 2015-01-03"," 2015-01-04")
df<-data.frame(worker.id, start.station ,end.station, day)

worker.id start.station end.station         day
1         1           234         342  2015-01-02
2         1           342         425  2015-01-03
3         1           235         621  2015-01-04
4         2           155         732  2015-01-02
5         2           218         632  2015-01-03
6         2           632         422  2015-01-04

df\$same<-ifelse(df\$start.station!=lag(df\$end.station) |
df\$day=="2015-01-02", "FALSE","TRUE")

worker.id start.station end.station        day  same
1         1           234         342 2015-01-02 FALSE
2         1           342         425 2015-01-03  TRUE
3         1           235         621 2015-01-04 FALSE
4         2           155         732 2015-01-02 FALSE
5         2           218         632 2015-01-03 FALSE
6         2           632         422 2015-01-04  TRUE
``````
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download