baha-kev baha-kev - 1 month ago 22
R Question

block bootstrap from subject list

I'm trying to efficiently implement a block bootstrap technique to get the distribution of regression coefficients. The main outline is as follows:

I have a panel data set, say

are the indices. For each iteration of the bootstrap, I wish to sample with replacement n subjects. From this sample, I need to construct a new data frame that is an
stack of all the observations for each sampled subject. With this new data.frame, I can run the regression and pull out the coefficients. Repeat for a bunch of iterations, say 100.

  • Each firm can potentially be selected multiple times, so I need to include its data multiple times in each iteration's data set.

  • Using a loop and subset approach, like below, seems computationally burdensome.

  • My real data frame, n, and # iterations is much larger than the example below.

My thoughts initially are to break the existing total data frame into a list by
using the
command. From there, use
to get the new list, then perhaps implement
from the
package to construct a new data frame?

Any thoughts are appreciated!

Example slow code:

data("Grunfeld", package="plm")

firms = unique(Grunfeld$firm)
n = 10
iterations = 100

for(j in 1:iterations){

v = sample(length(firms),n,replace=TRUE)
newdata = NULL

for(i in 1:n){
newdata = rbind(newdata,subset(Grunfeld, firm == v[i]))

reg1 = lm(value ~ inv + capital, data = newdata)
mybootresults[[j]] = coefficients(reg1)


mybootresults =,ncol=iterations)))
names(mybootresults) = names(reg1$coefficients)

(Intercept) inv capital
1 373.8591 6.981309 -0.9801547
2 370.6743 6.633642 -1.4526338
3 528.8436 6.960226 -1.1597901
4 331.6979 6.239426 -1.0349230
5 507.7339 8.924227 -2.8661479


How about something like this:

myfit <- function(x, i) {
   mydata <-"rbind", lapply(i, function(n) subset(Grunfeld, firm==x[n])))
   coefficients(lm(value ~ inv + capital, data = mydata))

firms <- unique(Grunfeld$firm)

b0 <- boot(firms, myfit, 999)