I am attempting to work through Hadley Wickham's R for Data Science and have gotten tripped up on the following question: "How could you use arrange() to sort all missing values to the start? (Hint: use is.na())" I am using the flights dataset included in the nycflights13 package. Given that arrange() sorts all unknown values to the bottom of the dataframe, I am not sure how one would do the opposite across the missing values of all variables. I realize that this question can be answered with base R code, but I am specifically interested in how this would be done using dplyr and a call to the arrange() and is.na() functions. Thanks.
We can wrap it with
desc to get the missing values at the start
flights %>% arrange(desc(is.na(dep_time)), desc(is.na(dep_delay)), desc(is.na(arr_time)), desc(is.na(arr_delay)), desc(is.na(tailnum)), desc(is.na(air_time)))
The NA values were only found in those variables based on
names(flights)[colSums(is.na(flights)) >0] # "dep_time" "dep_delay" "arr_time" "arr_delay" "tailnum" "air_time"
Instead of passing each variable name at a time, we can also use NSE
nm1 <- paste0("desc(is.na(", names(flights)[colSums(is.na(flights)) >0], "))") flights %>% arrange_(.dots = nm1) %>% head() #year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time arr_delay carrier flight tailnum # <int> <int> <int> <int> <int> <dbl> <int> <int> <dbl> <chr> <int> <chr> #1 2013 1 2 NA 1545 NA NA 1910 NA AA 133 <NA> #2 2013 1 2 NA 1601 NA NA 1735 NA UA 623 <NA> #3 2013 1 3 NA 857 NA NA 1209 NA UA 714 <NA> #4 2013 1 3 NA 645 NA NA 952 NA UA 719 <NA> #5 2013 1 4 NA 845 NA NA 1015 NA 9E 3405 <NA> #6 2013 1 4 NA 1830 NA NA 2044 NA 9E 3716 <NA> #Variables not shown: origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>, hour <dbl>, minute <dbl>, # time_hour <time>.