user181187 user181187 - 1 month ago 3
R Question

data.frame - Make observations independent within a 3 day window

I have got a

data.frame
with 4 columns and ca. 6000 rows.

The columns are:

ID
,
Day
(the number of day since the first day of the time period considered),
Year
(the year in which the observation has been recorded) and
Count
(the number of observations pooled, between all the IDs, in that particular day).

df = read.table(text = 'ID Day Year Count
33012 12448 2001 46
35004 12448 2001 46
35008 12448 2001 46
37006 12448 2001 46
21009 4835 1980 44
24005 4835 1980 44
27001 4835 1980 44
27002 4835 1980 44
25005 5569 1982 34
29001 5569 1982 34
29002 5569 1982 34
30003 5569 1982 34', header = TRUE)


I need to create a time window of three days and run a for loop for each Day, counting the number of observations in that time range.

e.g. starting from Day 12448 (or "day 0") I need to check in all the dataframe for Day 12447 (day prior or "day -1") and Day 12449 (day after or "day +1") if exist observations recorded.
In other words, taking di = 12448, there exist any "di +1" and/or "di -1" in the dataframe?
If yes, I have to delete from the dataframe "di +1" and/or "di -1" in order to avoid overlapping and add both their "Count" values into the "di" 's "Count" observation.

Do you have any hint that can help me making the for loop?




@thepule, thanks a lot. I tried to run your code within my dataset. So I created a vector with all the days in the column "Day"

days <- unique(df$Day)


and adjusted the for loop appropriately, but it doesn't work, in the sense that I obtain very low values in the column Count.

Where is the mistake?




Here an example of my dataframe:

df = read.table(text ='ID Day Year Count
33012 12448 2001 5
35004 12448 2001 5
35008 12448 2001 5
37006 12448 2001 5
37008 12448 2001 5
27900 12800 2002 4
27987 12800 2002 4
27123 12800 2002 4
27341 12800 2002 4
56124 14020 2003 3
12874 14020 2003 3
11447 14020 2003 3
11231 12447 2001 2
31879 12447 2001 2
56784 12449 2001 1
64148 12799 2002 1
45613 12801 2001 1
77632 10324 1991 1
55313 14002 2003 1
11667 14019 2003 1', header = TRUE)


My output, after the for loop, should be:

ID Day Year Count
1 33012 12448 2001 8
2 35004 12448 2001 8
3 35008 12448 2001 8
4 37006 12448 2001 8
5 37008 12448 2001 8
6 27900 12800 2002 6
7 27987 12800 2002 6
8 27123 12800 2002 6
9 27341 12800 2002 6
10 56124 14020 2003 4
11 12874 14020 2003 4
12 11447 14020 2003 4
13 77632 10324 1991 1
14 55313 14002 2003 1


n.b each observations, for each ID, are max 1 per year.
n.b.b. the Count column is ordered as decreasing = TRUE

Answer

Updated answer:

# Create data frame    
tt <- read.table(text = "
        ID    Day       Year     Count
 33012  12448       2001         5
 35004  12448       2001         5
 35008  12448       2001         5
 37006  12448       2001         5
 37008  12448       2001         5
 27900  12800       2002         4
 27987  12800       2002         4
 27123  12800       2002         4
 27341  12800       2002         4
 56124  14020       2003         3
 12874  14020       2003         3
 11447  14020       2003         3
 11231  12447       2001         2
 31879  12447       2001         2
 56784  12449       2001         1
 64148  12799       2002         1
 45613  12801       2001         1
 77632  10324       1991         1
 55313  14002       2003         1
 11667  14019       2003         1", header= T)


# Vector of day targets you want to repeat the procedure for
targets <- unique(tt$Day)


for (i in targets) {
    temp <- tt$Count[tt$Day == i]
    if(length(temp >0)) {
            condition <- tt$Day == i - 1
            if(any(condition)) {
                    tt$Count[tt$Day == i] <- mean(tt$Count[condition]) +  tt$Count[tt$Day == i]
                    tt <- tt[!condition,]
            }
            condition2 <- tt$Day == i + 1
            if(any(condition2)) {
                    tt$Count[tt$Day == i] <- mean(tt$Count[condition2]) +  tt$Count[tt$Day == i]
                    tt <- tt[!condition2,]
    }

    }
}

Output:

tt
  ID   Day Year Count
1  33012 12448 2001     8
2  35004 12448 2001     8
3  35008 12448 2001     8
4  37006 12448 2001     8
5  37008 12448 2001     8
6  27900 12800 2002     6
7  27987 12800 2002     6
8  27123 12800 2002     6
9  27341 12800 2002     6
10 56124 14020 2003     4
11 12874 14020 2003     4
12 11447 14020 2003     4
18 77632 10324 1991     1
19 55313 14002 2003     1
Comments