BWilker BWilker - 1 month ago 4
R Question

dplyr: Find date for which an event occurs

First question. I am fairly new at R. I have the following data frame.

Source: local data frame [865,264 x 10]

page_views date dayofweek daytype caseID dateDecision dateArgument dateRearg
1 169 2008-01-30 Wednesday 0 2007-001 2007-10-10 2007-10-01
2 211 2008-01-16 Wednesday 0 2007-001 2007-10-10 2007-10-01
3 203 2008-01-17 Thursday 0 2007-001 2007-10-10 2007-10-01
4 177 2008-01-14 Monday 0 2007-001 2007-10-10 2007-10-01
5 224 2008-01-15 Tuesday 0 2007-001 2007-10-10 2007-10-01
6 152 2008-01-12 Saturday 1 2007-001 2007-10-10 2007-10-01
7 149 2008-01-13 Sunday 1 2007-001 2007-10-10 2007-10-01
8 220 2008-01-10 Thursday 0 2007-001 2007-10-10 2007-10-01
9 169 2008-01-11 Friday 0 2007-001 2007-10-10 2007-10-01
10 189 2008-01-18 Friday 0 2007-001 2007-10-10 2007-10-01
.. ... ... ... ... ... ... ... ...
Variables not shown: caseName (chr), term (int)


I would like to find the first, earliest chronologically, date for each caseID for which page_views is greater than zero. I would like to create a new column with this date. The results should have one row for each caseID.

I am hoping that I can do this using dplyr, but I am open to other solutions. Using dplyr it seems group_by(caseID) and some kind of filter are the place to start, but I have had no luck.

I have searched stackoverflow and other places and haven't found anything that comes close.

Answer

With dplyr you can do this almost as it is written in your description.

x %>% group_by(caseID) %>% filter(page_views > 0) %>%
      arrange(date) %>% summarise(min_date=head(date,1))
Comments