Ziwei Ziwei - 3 months ago 6
R Question

how to replace the NA in a data frame with the average number of this data frame

I have a data frame like this:

nums id
1233 1
3232 2
2334 3
3330 1
1445 3
3455 3
7632 2
NA 3
NA 1


And I can know the average "nums" of each "id" by using:

id_avg <- aggregate(nums ~ id, data = dat, FUN = mean)


What I would like to do is to replace the NA with the value of the average number of the corresponding id. for example, the average "nums" of 1,2,3 are 1000, 2000, 3000, respectively. The NA when id == 3 will be replaced by 3000, the last NA whose id == 1 will be replaced by 1000.
I tried the following code to achieve this:

temp <- dat[is.na(dat$nums),]$id
dat[is.na(dat$nums),]$nums <- id_avg[id_avg[,"id"] ==temp,]$nums


However, the second part

id_avg[id_avg[,"id"] ==temp,]$nums


is always NA, which means I always pass NA to the NAs I want to replace.
I don't know where I was wrong, or do you have better method to do this?
Thank you

Answer

Here is a dplyr way:

df %>% 
       group_by(id) %>% 
       mutate(nums = replace(nums, is.na(nums), as.integer(mean(nums, na.rm = T))))

# Source: local data frame [9 x 2]
# Groups: id [3]

#    nums    id
#   <int> <int>
# 1  1233     1
# 2  3232     2
# 3  2334     3
# 4  3330     1
# 5  1445     3
# 6  3455     3
# 7  7632     2
# 8  2411     3
# 9  2281     1

You essentially want to merge the id_avg back to the original data frame by the id column, so you can also use match to follow your original logic:

dat$nums[is.na(dat$nums)] <- id_avg$nums[match(dat$id[is.na(dat$nums)], id_avg$id)]
dat
#        nums id
# 1: 1233.000  1
# 2: 3232.000  2
# 3: 2334.000  3
# 4: 3330.000  1
# 5: 1445.000  3
# 6: 3455.000  3
# 7: 7632.000  2
# 8: 2411.333  3
# 9: 2281.500  1