J. Doe. J. Doe. - 1 month ago 13
R Question

Long to wide with a character twist

I have a problem that looks like this and this but I cannot get what I am after.

I have the following sample dataset:

country_code=c('USA','USA','USA','USA','USA','USA','CHN','CHN','CHN','CHN','CHN','CHN')
target_var=c('V1','V1','V1' ,'V2' ,'V2' ,'V2' ,'V1' ,'V1' ,'V1','V2' ,'V2' ,'V2')
VAR= c('X7','X8','X140','X114','X18','X28','X29','X22','X2','X22','X23','X24')
Ranking= c(1 ,2.5 ,2.5 ,1.5 ,1.5 ,1.5 , 1 ,2 ,3 ,1.5 ,1.5 ,3)
df<-data.frame(country_code,target_var,VAR,Ranking)


And I need to convert from a long to wide format for all combinations of country_code and target_var. The twist I was referring to is that I only want to retain the top X VARs by Ranking (let's say 2 for this example), retaining the ties. So the end result of the example dataset will look like:

Please note that for USA the "ties" are retained so instead of the top 2 I get the top 3. The ties could have occurred in CHN instead.

enter image description here

I have tried with a nested loop and rbind, but I am not able to make it work. I also looked at a number of threads re long to wide but the vast majority only "reshape" numbers, not characters, which is what the VARs are. I suspect a dplyr solution is what would make sense but I can't get it work. Thank you

Answer

We can use top_n to subset the rows, and then spread from 'long' to 'wide'

library(tidyr)
df %>% 
  group_by(country_code, target_var) %>% 
  top_n(2, wt = Ranking) %>%
  mutate(n = row_number()) %>% 
  select(-Ranking) %>% 
  spread(n, VAR, sep="")
Comments