Kia Eisinga - 2 months ago 19
R Question

# Replacing NA with duplicates from adjacent row R

I have a dataset that contains features

`Date`
,
`Age`
and
`Customer_ID`
. Some of the rows in
`Age`
have missing values (NAs) in them and I would like to impute them.

Here is some example data:

``````column1 <- c("201101", "201102", "201101", "201102", "201103")
column2 <- c("12-17", "12-17", "30-35", NA, NA)
column3 <- c("1234", "1234", "5678", "5678", "5678")
df <- data.frame(column1, column2, column3)

Date      Age      Customer_ID
201101    12-17    1234
201102    12-17    1234
201101    30-35    5678
201102    NA       5678
201103    NA       5678
``````

I would like to replace the NAs in
`Age`
with 30-35.

So for all NAs, it has to check whether there is another row with the same
`Customer_ID`
and replace the NA with the value for
`Age`
stated in the other row.

Any ideas on how to do this? Thanks.

You can use the `fill` function from `tidyr`. It is a `tidyr` function for last observation carried forward, i.e, fill NA values with previous non-NA value. In order for this to work, you can use `arrange` to sort the column2 which sorts the `NA` values behind non-NAs and then you can group by customer ID and fill the `Age` column:

``````library(dplyr)
library(tidyr)
df %>% arrange(Age) %>% group_by(Customer_ID) %>% fill(Age)

# Source: local data frame [5 x 3]
# Groups: Customer_ID [2]

#      Date    Age Customer_ID
#    <fctr>  <fctr>  <fctr>
# 1  201101   12-17    1234
# 2  201102   12-17    1234
# 3  201101   30-35    5678
# 4  201102   30-35    5678
# 5  201103   30-35    5678
``````