Edu Edu - 2 months ago 6
R Question

Recoding missing data in longitudinal data frames with R

I have a data frame with a similar longitudinal structure as

data
:

data = data.frame (
ID = c("a","a","a","b","b","b","c","c", "c"),
period = c(1,2,3,1,2,3,1,2,3),
size = c(3,3,NA, NA, NA,1, 14,14, 14))


The values of the variable
size
are fixed so that each period has the same value for
size
. Yet some observations have missing values. My aim consists of replacing these missing values
with the value of
size
associated with the periods where there is no missing (e.g. 3 for
ID
"a" and 1 for
ID
"b").

The desired data frame should look something similar to:

data.1

ID period value
a 1 3
a 2 3
a 3 3
b 1 1
b 2 1
b 3 1
c 1 14
c 2 14
c 3 14


I have tried different combinations of the formula below but I don't get the result I am looking for.

library(dplyr)

data.1 = data %>% group_by(ID) %>%
mutate(new.size = ifelse(is.na(size), !is.na(size),
ifelse(!is.na(size), size, 0)))


That yields the following:

data.1
Source: local data frame [9 x 4]
Groups: ID [3]

ID period size new.size
(fctr) (dbl) (dbl) (dbl)
1 a 1 3 3
2 a 2 3 3
3 a 3 NA 0
4 b 1 NA 0
5 b 2 NA 0
6 b 3 1 1
7 c 1 14 14
8 c 2 14 14
9 c 3 14 14


I would be grateful if someone could give me a hint on how to get the right solution.

Answer

here another solution using dplyr with na.omit

group_by(data, ID) %>%
     mutate(value=na.omit(size)[1])
Source: local data frame [9 x 4]
Groups: ID [3]

      ID period  size value
  <fctr>  <dbl> <dbl> <dbl>
1      a      1     3     3
2      a      2     3     3
3      a      3    NA     3
4      b      1    NA     1
5      b      2    NA     1
6      b      3     1     1
7      c      1    14    14
8      c      2    14    14
9      c      3    14    14

note that you can replace na.omit with max(size, na.rm=TRUE) if you are looking for maximum for example.

Comments