pachamaltese - 5 months ago 38

R Question

I am using

`simpleboot`

This is my function:

`lb_weighted_median_dplyr <- function(x,v) {`

set.seed(1234)

b <- one.boot(x, weights = v, FUN = function(x,w) matrixStats::weightedMedian(x, w = v, na.rm = TRUE), R = 100, student = FALSE)

round(perc(b, 0.025), 0)

}

What the function does is to calculate the lower bound of the confidence interval when I run

`ddply(wage_by_gender_2015, .(sex,region), summarise, FUN = lb_weighted_median_dplyr(wage, exp_region))`

Where

`wage`

`exp_region`

I don't have data for some regions, therefore the function fails with some regions and returns

`Error in eval(substitute(expr), envir, enclos) : NA in probability vector`

How can I bypass that error and obtain NA as the lower bound for a region without data?

A

`dplyr`

`NA in probability vector`

`grouped <- group_by(wage_by_gender_2015, sex, region)`

dplyr::summarise(grouped, FUN = lb_weighted_median_dplyr(wage, exp_region))

Relevant sample of the data here: http://users.dcc.uchile.cl/~mvargas/casen/wage_by_gender_2015.RData

Answer

```
wage_by_gender_2015 <- data.frame(sex = rep(c("male", "female"),100),
region = rep(c("north", "south", "east",
"west"), 50),
exp_region = abs(rnorm(100)),
wage = abs(rnorm(100))
)
wage_by_gender_2015$exp_region[10] <- NA
ddply(wage_by_gender_2015, .(sex,region), summarise, FUN = lb_weighted_median_dplyr(wage, exp_region))
```

`Error in sample.int(length(x), replace = TRUE, ...) : NA in probability vector`

```
# impute
wage_by_gender_2015$exp_region <- RRF::na.roughfix(wage_by_gender_2015$exp_region)
ddply(wage_by_gender_2015, .(sex,region), summarise, FUN = lb_weighted_median_dplyr(wage, exp_region))
```

`sex region FUN 1 female south 0 2 female west 0 3 male east 1 4 male north 0`

As mentioned in the comment I would've used your sample data but it was missing `sex`

.