JohnnyDeer JohnnyDeer - 3 months ago 18
R Question

R - Transforming DataFrame

I created an example of my data structure below.

Problem 1: I found out that "days" is indeed the difference between $start and $end but it does not reflect the actual number of days of the measurement. So for each id in $id, I need a counter. As a result, id=2 should have value "2" days instead of "4".

Solution:

Count <- rle(sort(activity$id))
activity$count <- Count[[1]][match(activity$id, Count[[2]])]


Problem 2: Afterwards, all measurements where we do not have exactly 4 days of measurement must be deleted. In this case, id 1,3,5 and 6 would survive, because id 2 and 4 would have only 2 and 3 data points, respectively.

Solution:

activity <- subset(activity, count== 30)


Problem 3: I need to filter cases that are marked as "finished" in$status. Here, only id 1,3 and 6 would survive after all adjustments.

How would each step look like in R?

id status energy sun start end days
1 ok 10 10 01/05/16 01/09/16 4
1 ok 20 20 01/05/16 01/09/16 4
1 ok 30 30 01/05/16 01/09/16 4
1 finished 40 40 01/05/16 01/09/16 4
2 ok 0 5 12/06/15 12/10/15 4
2 failed 0 5 12/06/15 12/10/15 4
3 ok 10 5 12/26/15 12/30/15 4
3 ok 20 10 12/26/15 12/30/15 4
3 ok 30 15 12/26/15 12/30/15 4
3 finished 40 20 12/26/15 12/30/15 4
4 ok 10 0 07/09/15 07/12/15 3
4 ok 15 10 07/09/15 07/12/15 3
4 failed 5 10 07/09/15 07/12/15 3
5 ok 10 5 11/16/15 11/20/15 4
5 ok 12 10 11/16/15 11/20/15 4
5 ok 18 15 11/16/15 11/20/15 4
5 failed 20 20 11/16/15 11/20/15 4
6 ok 10 20 12/31/15 01/04/16 4
6 ok 20 30 12/31/15 01/04/16 4
6 ok 30 35 12/31/15 01/04/16 4
6 finished 40 45 12/31/15 01/04/16 4

Answer

You wish to apply functions to a dataframe split by factors (in your case, id). In base R, you want by() and its related function tapply(). Suppose d is your data:

d$days <- tapply(d$id, d$id, length)[d$id]
d <- subset(d, days == 4)
d <- do.call(rbind,
  by(d, d$id, function(x) if ("finished" %in% x$status) x else NULL)
)
Comments