Doug Fir Doug Fir - 4 years ago 144
R Question

After pivoting a df using gather and spread can I order the rows based on new Var variable?

Sample data:

sample_data <- data.frame(
Device = c("Desktop", "Desktop", "Desktop", "Tablet", "Tablet", "Tablet", "Mobile", "Mobile", "Mobile"),
Date_Range = seq(as.Date("2017-01-01"), as.Date("2017-01-30"), by = 1),
Sessions = ceiling(rnorm(90, mean = 3000, sd = 300)),
Conversion1 = ceiling(rnorm(90, mean = 800, sd = 100)),
Conversion2 = ceiling(rnorm(90, mean = 400, sd = 50))
)


Some wrangling with dplyr

library(dplyr)
example <- tbl_df(sample_data) %>%
group_by(Date_Range) %>%
summarize(Sessions = sum(Sessions),
Con1 = sum(Conversion1),
Con2 = sum(Conversion2)) %>%
mutate(Registration_Rate = round(Con1/Sessions,2),
Conversion_Rate = round(Con2/Sessions,2)) %>%
gather(Var, Val, - Date_Range) %>%
group_by(Date_Range) %>%
spread(Date_Range, Val)


This results in a left to right reading trend, by date, of KPIs including sessions, conversions and resulting conversion rates.

Screen:
enter image description here

I would like to order these rows. Currently the order of rows reads:

Con1, Con2, Conversion_Rate, Registration_Rate, Sessions (Looks like it's in alphabetical order).

Specifically I would like the table in the following order:
Sessions, Con1, Con2, Registration_Rate, Conversion_Rate.

I looked at row.names(example) but that returns a character vector of 1:5 (yes character).

Ideally I would be able to sort the rows by the Var variable created during the gather() operation. So sort them on name rather than index. But not at the sacrifice of simplicity.

Is there a straight forwards way to do this?

Answer Source

One way could be:

example <- tbl_df(sample_data) %>%
  group_by(Date_Range) %>%
  summarize(Sessions = sum(Sessions),
            Con1 = sum(Conversion1),
            Con2 = sum(Conversion2)) %>%
  mutate(Registration_Rate = round(Con1/Sessions,2),
         Conversion_Rate = round(Con2/Sessions,2)) %>%
  gather(Var, Val, - Date_Range) %>%
  group_by(Date_Range) %>%
  spread(Date_Range, Val)%>%
  mutate(Var =  factor(Var, levels = c("Sessions", "Con1", "Con2", "Registration_Rate", "Conversion_Rate"))) %>%
  arrange(Var)

Var is a character vector, so turning it into a factor and setting the levels to the desired order means that arrange() will sort according to these levels.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download