baha-kev - 1 year ago 68

R Question

I'm trying to efficiently implement a block bootstrap technique to get the distribution of regression coefficients. The main outline is as follows:

I have a panel data set, say

`firm`

`year`

`rbind()`

- Each firm can potentially be selected multiple times, so I need to include its data multiple times in each iteration's data set.
- Using a loop and subset approach, like below, seems computationally burdensome.
- My real data frame, n, and # iterations is much larger than the example below.

My thoughts initially are to break the existing total data frame into a list by

`subject`

`split()`

`sample(unique(df1$subject),n,replace=TRUE)`

`quickdf()`

`plyr`

Any thoughts are appreciated!

Example slow code:

`require(plm)`

data("Grunfeld", package="plm")

firms = unique(Grunfeld$firm)

n = 10

iterations = 100

mybootresults=list()

for(j in 1:iterations){

v = sample(length(firms),n,replace=TRUE)

newdata = NULL

for(i in 1:n){

newdata = rbind(newdata,subset(Grunfeld, firm == v[i]))

}

reg1 = lm(value ~ inv + capital, data = newdata)

mybootresults[[j]] = coefficients(reg1)

}

mybootresults = as.data.frame(t(matrix(unlist(mybootresults),ncol=iterations)))

names(mybootresults) = names(reg1$coefficients)

mybootresults

(Intercept) inv capital

1 373.8591 6.981309 -0.9801547

2 370.6743 6.633642 -1.4526338

3 528.8436 6.960226 -1.1597901

4 331.6979 6.239426 -1.0349230

5 507.7339 8.924227 -2.8661479

...

...

Recommended for you: Get network issues from **WhatsUp Gold**. **Not end users.**

Answer Source

How about something like this:

```
myfit <- function(x, i) {
mydata <- do.call("rbind", lapply(i, function(n) subset(Grunfeld, firm==x[n])))
coefficients(lm(value ~ inv + capital, data = mydata))
}
firms <- unique(Grunfeld$firm)
b0 <- boot(firms, myfit, 999)
```

Recommended from our users: **Dynamic Network Monitoring from WhatsUp Gold from IPSwitch**. ** Free Download**