user181187 - 3 years ago 125
R Question

# data.frame - Make observations independent within a 3 day window

I have got a

`data.frame`
with 4 columns and ca. 6000 rows.

The columns are:

`ID`
,
`Day`
(the number of day since the first day of the time period considered),
`Year`
(the year in which the observation has been recorded) and
`Count`
(the number of observations pooled, between all the IDs, in that particular day).

``````df = read.table(text = 'ID     Day         Year   Count
33012   12448         2001      46
35004   12448         2001      46
35008   12448         2001      46
37006   12448         2001      46
21009    4835         1980      44
24005    4835         1980      44
27001    4835         1980      44
27002    4835         1980      44
25005    5569         1982      34
29001    5569         1982      34
29002    5569         1982      34
30003    5569         1982      34', header = TRUE)
``````

I need to create a time window of three days and run a for loop for each Day, counting the number of observations in that time range.

e.g. starting from Day 12448 (or "day 0") I need to check in all the dataframe for Day 12447 (day prior or "day -1") and Day 12449 (day after or "day +1") if exist observations recorded.
In other words, taking di = 12448, there exist any "di +1" and/or "di -1" in the dataframe?
If yes, I have to delete from the dataframe "di +1" and/or "di -1" in order to avoid overlapping and add both their "Count" values into the "di" 's "Count" observation.

Do you have any hint that can help me making the for loop?

@thepule, thanks a lot. I tried to run your code within my dataset. So I created a vector with all the days in the column "Day"

``````days <- unique(df\$Day)
``````

and adjusted the for loop appropriately, but it doesn't work, in the sense that I obtain very low values in the column Count.

Where is the mistake?

Here an example of my dataframe:

``````df = read.table(text ='ID    Day       Year     Count
33012  12448       2001         5
35004  12448       2001         5
35008  12448       2001         5
37006  12448       2001         5
37008  12448       2001         5
27900  12800       2002         4
27987  12800       2002         4
27123  12800       2002         4
27341  12800       2002         4
56124  14020       2003         3
12874  14020       2003         3
11447  14020       2003         3
11231  12447       2001         2
31879  12447       2001         2
56784  12449       2001         1
64148  12799       2002         1
45613  12801       2001         1
77632  10324       1991         1
55313  14002       2003         1
11667  14019       2003         1', header = TRUE)
``````

My output, after the for loop, should be:

``````         ID    Day       Year     Count
1     33012  12448       2001         8
2     35004  12448       2001         8
3     35008  12448       2001         8
4     37006  12448       2001         8
5     37008  12448       2001         8
6     27900  12800       2002         6
7     27987  12800       2002         6
8     27123  12800       2002         6
9     27341  12800       2002         6
10    56124  14020       2003         4
11    12874  14020       2003         4
12    11447  14020       2003         4
13    77632  10324       1991         1
14    55313  14002       2003         1
``````

n.b each observations, for each ID, are max 1 per year.
n.b.b. the Count column is ordered as decreasing = TRUE

``````# Create data frame
ID    Day       Year     Count
33012  12448       2001         5
35004  12448       2001         5
35008  12448       2001         5
37006  12448       2001         5
37008  12448       2001         5
27900  12800       2002         4
27987  12800       2002         4
27123  12800       2002         4
27341  12800       2002         4
56124  14020       2003         3
12874  14020       2003         3
11447  14020       2003         3
11231  12447       2001         2
31879  12447       2001         2
56784  12449       2001         1
64148  12799       2002         1
45613  12801       2001         1
77632  10324       1991         1
55313  14002       2003         1
11667  14019       2003         1", header= T)

# Vector of day targets you want to repeat the procedure for
targets <- unique(tt\$Day)

for (i in targets) {
temp <- tt\$Count[tt\$Day == i]
if(length(temp >0)) {
condition <- tt\$Day == i - 1
if(any(condition)) {
tt\$Count[tt\$Day == i] <- mean(tt\$Count[condition]) +  tt\$Count[tt\$Day == i]
tt <- tt[!condition,]
}
condition2 <- tt\$Day == i + 1
if(any(condition2)) {
tt\$Count[tt\$Day == i] <- mean(tt\$Count[condition2]) +  tt\$Count[tt\$Day == i]
tt <- tt[!condition2,]
}

}
}
``````

Output:

``````tt
ID   Day Year Count
1  33012 12448 2001     8
2  35004 12448 2001     8
3  35008 12448 2001     8
4  37006 12448 2001     8
5  37008 12448 2001     8
6  27900 12800 2002     6
7  27987 12800 2002     6
8  27123 12800 2002     6
9  27341 12800 2002     6
10 56124 14020 2003     4
11 12874 14020 2003     4
12 11447 14020 2003     4
18 77632 10324 1991     1
19 55313 14002 2003     1
``````
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download