N Stenfors - 7 months ago 37
R Question

# Extract first and last values among a number of columns in data frame

I have observed subjects

`a`
-
`d`
during 2-4 years and each year they reported a numerical value. I want to extract the first and last value of each subject, ignoring NAs. How do I create the new variables
`first_value`
and
`last_value`
? In this example, I have included the desired result:

``````df <- data.frame(subject = c("a","b","c","d"),
year1 = c(1, 2, NA, NA),
year2 = c(3, 4, NA, 5),
year3 = c(6, 7, 8, NA),
year4 = c(9, 10, NA, 11),
first_value <- c(1, 2, 8, 5),
last_value <- c(9, 10, 8, 11))
``````

And what would be the solution if variables
`year1`
-
`year4`
were categorical?

Using the `data.table` package:

``````library(data.table)
setDT(df)[, `:=` (first_value = na.omit(unlist(.SD))[1],
last_value = tail(na.omit(unlist(.SD)),1)),
by = subject][]
``````

which gives:

``````   subject year1 year2 year3 year4 first_value last_value
1:       a     1     3     6     9           1          9
2:       b     2     4     7    10           2         10
3:       c    NA    NA     8    NA           8          8
4:       d    NA     5    NA    11           5         11
``````

Following the suggestion of @alexis_laz, you can use `max.col` as follows to get the repective values:

``````f <- max.col(!is.na(df[c("year1", "year2", "year3", "year4")]), 'first')
l <- max.col(!is.na(df[c("year1", "year2", "year3", "year4")]), 'last')

df\$first_value <- sapply(seq_along(f), function(i) df[,-1][i,f[i]])
df\$last_value <- sapply(seq_along(l), function(i) df[,-1][i,l[i]])
``````

which will get you the same result.

And using the `dplyr` and `tidyr` packages:

``````library(dplyr)
library(tidyr)

df %>%
gather(year, val, 2:5) %>%
filter(!is.na(val)) %>%
group_by(subject) %>%
summarise(first_value = first(val),
last_value = last(val)) %>%
left_join(df, ., by = 'subject')
``````

WARNING: A variation of this without using `filter` and using `na.omit(val)` (or `val[!is.na(val)]`) in `summarise`:

``````df %>%
gather(year, val, 2:5) %>%
group_by(subject) %>%
summarise(first_value = first(na.omit(val)),
last_value = last(na.omit(val))) %>%
left_join(df, ., by = 'subject')
``````

won't work as a result of the bugs reported here and here.