JohnnyDeer - 1 year ago 95
R Question

# R - Transforming DataFrame

I created an example of my data structure below.

Problem 1: I found out that "days" is indeed the difference between \$start and \$end but it does not reflect the actual number of days of the measurement. So for each id in \$id, I need a counter. As a result, id=2 should have value "2" days instead of "4".

Solution:

``````Count <- rle(sort(activity\$id))
activity\$count <- Count[[1]][match(activity\$id, Count[[2]])]
``````

Problem 2: Afterwards, all measurements where we do not have exactly 4 days of measurement must be deleted. In this case, id 1,3,5 and 6 would survive, because id 2 and 4 would have only 2 and 3 data points, respectively.

Solution:

``````activity <- subset(activity, count== 30)
``````

Problem 3: I need to filter cases that are marked as "finished" in\$status. Here, only id 1,3 and 6 would survive after all adjustments.

How would each step look like in R?

``````id  status   energy sun start       end         days
1   ok       10     10  01/05/16    01/09/16    4
1   ok       20     20  01/05/16    01/09/16    4
1   ok       30     30  01/05/16    01/09/16    4
1   finished 40     40  01/05/16    01/09/16    4
2   ok       0      5   12/06/15    12/10/15    4
2   failed   0      5   12/06/15    12/10/15    4
3   ok       10     5   12/26/15    12/30/15    4
3   ok       20     10  12/26/15    12/30/15    4
3   ok       30     15  12/26/15    12/30/15    4
3   finished 40     20  12/26/15    12/30/15    4
4   ok       10     0   07/09/15    07/12/15    3
4   ok       15     10  07/09/15    07/12/15    3
4   failed   5      10  07/09/15    07/12/15    3
5   ok       10     5   11/16/15    11/20/15    4
5   ok       12     10  11/16/15    11/20/15    4
5   ok       18     15  11/16/15    11/20/15    4
5   failed   20     20  11/16/15    11/20/15    4
6   ok       10     20  12/31/15    01/04/16    4
6   ok       20     30  12/31/15    01/04/16    4
6   ok       30     35  12/31/15    01/04/16    4
6   finished 40     45  12/31/15    01/04/16    4
``````

You wish to apply functions to a dataframe split by factors (in your case, `id`). In `base` R, you want `by()` and its related function `tapply()`. Suppose `d` is your data:
``````d\$days <- tapply(d\$id, d\$id, length)[d\$id]