Jamesm131 - 3 months ago 15

R Question

I would like to estimate the values of a numeric variable in a data frame based on the median of the same variable given other factors. I would then like to replace the NA's for the numeric Variable with these estimates.

I have a data frame like this:

`Fac1 Fac2 Var1`

A a 20

A b 30

B a 5

B b 10

.

.

.

I have used the agregate function to find these medians for each combination of factors:

`A a = 22`

A b = 28

B a = 12

B b = 8

So any NA's in Var1 would be replaced with the corresponding median based on the combinations of the factors.

I understand that this may be done by replacing the missing values for each subset of the data individually, however that would become tedious quickly given more than two factors.

I was wondering if there are some more efficient ways to get this result.

Answer

You haven't provided a sample data but based on your question, I think this should work.

As @Roland mentioned no need to calculate `median`

separately.

Assuming your dataframe as `df`

. For every group (here `Fac1`

and `Fac2`

) we calculate the median removing the `NA`

values. Further we select only the indices which has `NA`

values and replace it by its groups median value.

```
df$Var1[is.na(df$Var1)] <- ave(df$Var1,df$Fac1, df$Fac2, FUN=function(x)
median(x, na.rm = T)[is.na(df$Var1)]
```