Adam Robinsson - 1 year ago 382
R Question

# Rolling mean (moving average) by group/id with dplyr

I have a longitudinal follow-up of blood pressure recordings.

The value at a certain point is less predictive than is the moving average (rolling mean), which is why I'd like to calculate it. The data looks like

``````test <- read.table(header=TRUE, text = "
ID  AGE   YEAR_VISIT  BLOOD_PRESSURE  TREATMENT
1 20  2000    NA 3
1 21  2001    129 2
1 22  2002    145 3
1 22  2002    130 2
2 23  2003    NA  NA
2 30  2010    150 2
2 31  2011    110 3
4 50  2005    140 3
4 50  2005    130 3
4 50  2005    NA  3
4 51  2006    312 2
5 27  2010    140 4
5 28  2011    170 4
5 29  2012    160 NA
7 40  2007    120 NA
")
``````

I'd like to calculate a new variable, called BLOOD_PRESSURE_UPDATED. This variable should be the moving average for BLOOD_PRESSURE and have the following characteristics:

• A moving average is the current value plus the previous value divided by two.

• For the first observation, the BLOOD_PRESSURE_UPDATED is just the current BLOOD_PRESSURE. If that is
missing, BLOOD_PRESSURE_UPDATED should be the overall mean.

• Missing values should be filled in with nearest previous value.

I've tried the following:

``````test2 <- test %>%
group_by(ID) %>%
arrange(ID, YEAR_VISIT) %>%
mutate(BLOOD_PRESSURE_UPDATED = rollmean(x=BLOOD_PRESSURE, 2)) %>%
ungroup()
``````

I have also tried rollaply and rollmeanr without succeeding.

I'd appreciate some assistance.

If you are not committed to to `dplyr` this should work:

``````get.mav <- function(bp,n=2){
require(zoo)
if(is.na(bp[1])) bp[1] <- mean(bp,na.rm=TRUE)
bp <- na.locf(bp,na.rm=FALSE)
if(length(bp)<n) return(bp)
c(bp[1:(n-1)],rollapply(bp,width=n,mean,align="right"))
}
test <- with(test,test[order(ID,YEAR_VISIT),])

test\$BLOOD_PRESSURE_UPDATED <-
unlist(aggregate(BLOOD_PRESSURE~ID,test,get.mav,na.action=NULL,n=2)\$BLOOD_PRESSURE)
test
#    ID AGE YEAR_VISIT BLOOD_PRESSURE TREATMENT BLOOD_PRESSURE_UPDATED
# 1   1  20       2000             NA         3               134.6667
# 2   1  21       2001            129         2               131.8333
# 3   1  22       2002            145         3               137.0000
# 4   1  22       2002            130         2               137.5000
# 5   2  23       2003             NA        NA               130.0000
# 6   2  30       2010            150         2               140.0000
# 7   2  31       2011            110         3               130.0000
# ...
``````

This works for moving averages > 2 as well.

And here's a data.table solution, which is likely to be much faster if your dataset is large.

``````library(data.table)
setDT(test)     # converts test to a data.table in place
setkey(test,ID,YEAR_VISIT)
test[,BLOOD_PRESSURE_UPDATED:=as.numeric(get.mav(BLOOD_PRESSURE,2)),by=ID]
test
#    ID AGE YEAR_VISIT BLOOD_PRESSURE TREATMENT BLOOD_PRESSURE_UPDATED
#  1:  1  20       2000             NA         3               134.6667
#  2:  1  21       2001            129         2               131.8333
#  3:  1  22       2002            145         3               137.0000
#  4:  1  22       2002            130         2               137.5000
#  5:  2  23       2003             NA        NA               130.0000
#  6:  2  30       2010            150         2               140.0000
#  7:  2  31       2011            110         3               130.0000
# ...
``````
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download