Jamesm131 - 25 days ago 5
R Question

# Most efficient way to replace NAs in a data frame based on a subset of other row factors (using median as an estimate) in R

I would like to estimate the values of a numeric variable in a data frame based on the median of the same variable given other factors. I would then like to replace the NA's for the numeric Variable with these estimates.

I have a data frame like this:

``````Fac1   Fac2   Var1
A      a      20
A      b      30
B      a      5
B      b      10
.
.
.
``````

I have used the agregate function to find these medians for each combination of factors:

``````A a = 22
A b = 28
B a = 12
B b = 8
``````

So any NA's in Var1 would be replaced with the corresponding median based on the combinations of the factors.

I understand that this may be done by replacing the missing values for each subset of the data individually, however that would become tedious quickly given more than two factors.
I was wondering if there are some more efficient ways to get this result.

As @Roland mentioned no need to calculate `median` separately.
Assuming your dataframe as `df`. For every group (here `Fac1` and `Fac2`) we calculate the median removing the `NA` values. Further we select only the indices which has `NA` values and replace it by its groups median value.
``````df\$Var1[is.na(df\$Var1)] <- ave(df\$Var1,df\$Fac1, df\$Fac2, FUN=function(x)