Noobie Noobie - 24 days ago 8
R Question

Dplyr : how to find the first-non missing string by groups?

Consider the following simple example

group <-c('A','A','A','B','B','B','B')
names<- c(NA,'fred',NA,'josh','josh',NA,NA)
data=data_frame(group,names)

> data
# A tibble: 7 × 2
group names
<chr> <chr>
1 A <NA>
2 A fred
3 A <NA>
4 B josh
5 B josh
6 B <NA>
7 B <NA>


Here, I would like to get, for each
group
the first non missing name in
names
. How can I do that? The solution below using
coalesce
and
first
fail.

data %>% group_by(group) %>% mutate(first_non_missing = first(names),
first_non_missing_alt = coalesce(names)) %>% ungroup()

# A tibble: 7 × 4
group names first_non_missing first_non_missing_alt
<chr> <chr> <chr> <chr>
1 A <NA> <NA> <NA>
2 A fred <NA> fred
3 A <NA> <NA> <NA>
4 B josh josh josh
5 B josh josh josh
6 B <NA> josh <NA>
7 B <NA> josh <NA>


Indeed, for group
A
,
first_non_missing
should be
fred
for all three observations..

Many thanks!

Answer

Summarise will give one entry per group, here, finding the first non-missing using which

data %>%
  group_by(group) %>%
  summarise(first_non_missing = names[which(!is.na(names))[1]])

gives

  group first_non_missing
  <chr>             <chr>
1     A              fred
2     B              josh

If you still want all of the rows, replace summarise with mutate.

Comments