Hamid Oskorouchi Hamid Oskorouchi - 1 month ago 9
R Question

R: Count event occurrence and assign it to individuals according to date and place of interview

I have two dataframes in R:

A.df
and
B.df
. The first contains N rows where each row is an event that happened in a certain date and place.

The second is a list of individuals that have been interviewed in a certain date and place.

For each individual, I would like to count the number of events that happened within a certain timeframe before the interview date in the same location of the individual's place of interview.

Let's say that the time frame is x days before the date of interview, and that I have computed that date and stored in the variable
xdaysbefore
.

Here below how the data frames look like

A.df


#Event Date Place
1 2015-05-01 1
2 2015-03-11 1
3 2015-07-04 2
4 2015-05-10 3


B.df


#Individual Date of Interview Place xdaysbefore
1 2016-07-11 1 2014-09-11
2 2016-05-07 3 2014-07-04
3 2016-08-09 2 2014-03-22
4 2016-01-10 3 2014-09-17


Note that
Date
,
Date of Interview
and
xdaysbefore
are all in
Date R class


How can I count for each individual in
B.df
the events happened within the time frame
Date of Interview - xdaysbefore
according to the place in which the event has happened and the individual place of interview.

What I would expect in
B.df
would look like this:

B.df


#Individual Date of Interview Place xdaysbefore CountedEvents
1 2016-07-11 1 2014-09-11 2
2 2016-05-07 3 2014-07-04 1
3 2016-08-09 2 2014-03-22 1
4 2016-01-10 3 2014-09-17 1


where
CountedEvents
are the number of events happened in the time frame
Date of Interview - xdaysbefore
and in the same location where the individual i has been interviewed.

Answer

You can use apply on every row of B.df.

Take a subset of A.df where places are equal. Check if the Date in A.df is within the range of Date_of_Interview and xdaysbefore

B.df$CountedEvents <- apply(B.df, 1, function(x) {
    temp = A.df[A.df$Place %in% x[3],]
    length(temp$Date < as.Date(x[2]) & temp$Date > as.Date(x[4]))
 })

B.df
#     Individual Date_of_Interview Place xdaysbefore CountedEvents
#1          1        2016-07-11     1      2014-09-11       2
#2          2        2016-05-07     3      2014-07-04       1
#3          3        2016-08-09     2      2014-03-22       1
#4          4        2016-01-10     3      2014-09-17       1