stackoverflowuser2010 - 1 year ago 166
R Question

# Replace NA with mean based on row subset matching another column?

I have data where each row contains a person's gender and weight (in lbs):

``````genders <- c("FEMALE", "FEMALE", "FEMALE", "FEMALE", "FEMALE", "MALE", "MALE", "MALE", "MALE")
weights <- c(110.0, 120.0, 112.0, NA, NA, 190.0, 202.0, 195.0, NA)

df <- data.frame(gender=genders, weight=weights)
df
#   gender weight
# 1 FEMALE    110
# 2 FEMALE    120
# 3 FEMALE    112
# 4 FEMALE     NA
# 5 FEMALE     NA
# 6   MALE    190
# 7   MALE    202
# 8   MALE    195
# 9   MALE     NA
``````

For every row that an NA in the
`weight`
column, I would like to replace / impute the NA with the
`weight`
mean, but the mean should be calculated using only the rows that match the same
`gender`
value as the row with the NA.

Specifically, rows 4 and 5 have a
`gender`
of FEMALE and
`weight`
of NA. I want to replace the NA with the mean
`weight`
computed over the subset of rows that match the
`gender`
of FEMALE. In this case, the mean would be (110+120+112)/3=114.0 from the other rows 1, 2, and 3.

Likewise, I want to replace the NA in row 9 with the mean of the weights for MALE
`gender`
.

I tried the following command, but it replaced the NA with the mean weight over all users across both genders, which is not what I want.

``````df\$weight[is.na(df\$weight)] <- mean(subset(df, gender=df\$gender)\$weight, na.rm=T)
df
#   gender   weight
# 1 FEMALE 110.0000
# 2 FEMALE 120.0000
# 3 FEMALE 112.0000
# 4 FEMALE 154.8333
# 5 FEMALE 154.8333
# 6   MALE 190.0000
# 7   MALE 202.0000
# 8   MALE 195.0000
# 9   MALE 154.8333
``````

I searched other questions, but they are not quite the same problem as mine:

"Replace NA with mean matching the same ID"

"How to replace NA with mean by subset in R (impute with plyr?)"

"How to replace NA values in a table for selected columns? data.frame, data.table"

You could use `ave()` with `replace()` (or standard manual replacement).

``````df\$weight <- with(df, ave(weight, gender,
FUN = function(x) replace(x, is.na(x), mean(x, na.rm = TRUE))))
``````

which gives

``````  gender   weight
1 FEMALE 110.0000
2 FEMALE 120.0000
3 FEMALE 112.0000
4 FEMALE 114.0000
5 FEMALE 114.0000
6   MALE 190.0000
7   MALE 202.0000
8   MALE 195.0000
9   MALE 195.6667
``````
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download