user24318 user24318 - 4 months ago 11
R Question

Cumulative sum over a visit, each participant has different number of visits

I am working on simulating a longitudinal data set with irregular visit structure.

I want to add a column of "participants age at visit" starting from baseline visit.

(say baseline age is 65 and suppose the time intervals are (1,1.2,2,2.5) then I want to generate a new variable called "age.at.visit" which will have age (65,66,67.2,69.2,71.7).

Basically, adding intervals cumulatively to baseline age of 65 i.e. (65+1,65+1+1.2, 65+1.1.2+2, 65+1+1.2+2+2.5). I have simulated age at baseline and I want to add time interval to the baseline age based on the number of visits the participant had. I am struggling to generate the cumulative sum and need help. Here is my attempt

maxvst = 10; # maximum number of visits

nsubj = 100; #number of participants or subjects
nvstsubj = sample(1:maxvst,nsubj,replace=TRUE) # generate visit for each subj

bage=runif(nsubj,65,100) #baseline age
subj=rep(1:nsubj, nvstsubj) # subject ids

#generate visits and age of participants
visit=rep(0,length(subject))
age =rep(0,length(subject))

for (i in 1:nsubj){
idx = subject==i
vi = nvstsubj[i]
visit[idx] = 1:vi
intervals = runif(vi-1,1,3) #generate time intervals to add to baseline age
# generate age at each visits
age[idx]= # ??? cumulative sum over interval
}

Answer

This type of thing is often best kept in a list. I recreated your data to keep it all in a single data frame.

df <- data.frame(id = 1:100, 
           num_visits=sample(1:10,100,replace=TRUE),
           base_age = runif(100, 65, 100))

Data looks like this

head(df,4)
  id num_visits base_age
1  1          2 67.90497
2  2          3 70.77535
3  3          6 97.05501
4  4          6 77.31996

Then I applied the cumsum function to the duration between subsequent visits and added this to each row's base age. Note that, if there is only more than 1 visit you need to concatenate base age.

  a <- apply(df,1, function(x) {
    temp <- as.numeric(x["base_age"] + cumsum(runif(x["num_visits"], 1,2)))
    if(length(temp) > 0) temp <- c(x["base_age"], temp)
  })

Solution looks like this

    [[1]]
base_age                   
67.90497 69.85027 71.30138 

[[2]]
base_age                            
70.77535 72.34506 73.88659 75.21282 

[[3]]
 base_age                                                             
 97.05501  98.57490 100.01887 101.50815 102.52040 104.36888 105.62224 

[[4]]
base_age                                                       
77.31996 78.65842 80.17729 82.10347 83.60191 85.11311 86.18387