Nick Knauer Nick Knauer - 10 days ago 7
R Question

Reshaping Dataframe to have top 2 values

FAMILY<- c('FAMILYA', 'FAMILYA', 'FAMILYA', 'FAMILYA', 'FAMILYA', 'FAMILYB', 'FAMILYB', 'FAMILYB', 'FAMILYB', 'FAMILYB', 'FAMILYC', 'FAMILYC', 'FAMILYC', 'FAMILYC', 'FAMILYC')

CHILDREN<-c('JAKE', 'PETE', 'JASON', 'KEVIN', 'ALFRED','DALE', 'STEVE', 'MELISSA', 'DAN', 'THOMAS', 'CAIT', 'BRANDON', 'DEAN', 'ADAM', 'KELSEY')

CHANGE<-c(1000, -1000, 2000, 3000, 5000, 100, 300, 1234, -1022, -1111, -1112, 1000, 1002, 2131, 1231)

df1<-data.frame(FAMILY, CHILDREN, CHANGE)

df1

FAMILY CHILDREN CHANGE
1 FAMILYA JAKE 1000
2 FAMILYA PETE -1000
3 FAMILYA JASON 2000
4 FAMILYA KEVIN 3000
5 FAMILYA ALFRED 5000
6 FAMILYB DALE 100
7 FAMILYB STEVE 300
8 FAMILYB MELISSA 1234
9 FAMILYB DAN -1022
10 FAMILYB THOMAS -1111
11 FAMILYC CAIT -1112
12 FAMILYC BRANDON 1000
13 FAMILYC DEAN 1002
14 FAMILYC ADAM 2131
15 FAMILYC KELSEY 1231


I want to transform this dataframe to have 4 new extra columns: First two showing the 1) largest valued Child, 2) 2nd Largest valued Child and the last two columns showing the 3) smallest value Child, 4) 2nd Smallest value Child.

I would also like the Change next to it's respective Child.

Final format should look like this:

FAMILY TOTAL CHANGE INCREASE #1 INCREASE #2 DECREASE #1 DECREASE #2
FAMILYA 10000 ALFRED: 5000 KEVIN: 3000 PETE: -1000 JAKE: 1000
FAMILYB -499 MELISSA: 1234 STEVE: 300 THOMAS: -1111 DAN: -1022
FAMILYC 4252 ADAM: 2131 KELSEY: 1231 CAIT: -1112 BRANDON: 1000


If you think it would be easier to have the values of each child in a separate column next to it that works too but this is the concept I need help executing.

Any help would be great, thanks!

Answer
library(dplyr)
library(tidyr)

# below function helps to get the second max  or second min 
myfun <-  function(x, y) {
  u <- unique(x)
  u <- sort(u, decreasing = TRUE)
  if(y<0)
    u[length(x)-1]
  else
    u[y]
}

df2 <- df1 %>% group_by(FAMILY) %>% 
      summarise(a1=CHILDREN[which(CHANGE == max(CHANGE))]   , a2 = max(CHANGE), 
                b2 = myfun(CHANGE, 2)          , b1=CHILDREN[which(CHANGE == b2)] ,  
                c1=CHILDREN[which(CHANGE == min(CHANGE))]   , c2 = min(CHANGE), 
                d2 = myfun(CHANGE,-2)          , d1=CHILDREN[which(CHANGE == d2)]) 
#df2
#   FAMILY      a1    a2      b1    b2     c1    c2     d1    d2
#   <fctr>  <fctr> <dbl>  <fctr> <dbl> <fctr> <dbl> <fctr> <dbl>
#1 FAMILYA  ALFRED  5000  3000  KEVIN   PETE -1000  1000    JAKE
#2 FAMILYB MELISSA  1234   300  STEVE THOMAS -1111 -1022     DAN
#3 FAMILYC    ADAM  2131  1231 KELSEY   CAIT -1112  1000 BRANDON

# little clumpsy here... would like if someone could suggest a better way of uniting efficiently
df3 <- unite(df2, "A1", 2,3,sep = ":")
df4 <- unite(df3, "B1", 4,3,sep = ":")
df5 <- unite(df4, "c1", 4,5,sep = ":")
df6 <- unite(df5, "c1", 6,5,sep = ":")

#df6
#   FAMILY           A1           B1           c1         c1
#   <fctr>        <chr>        <chr>        <chr>      <chr>
#1 FAMILYA  ALFRED:5000  KEVIN:3000   PETE:-1000    JAKE:1000
#2 FAMILYB MELISSA:1234   STEVE:300 THOMAS:-1111    DAN:-1022
#3 FAMILYC    ADAM:2131 KELSEY:1231   CAIT:-1112 BRANDON:1000

Note : forgot to add TOTAL_CHANGE column add TOTAL CHANGE = sum(CHANGE) inside summarise() and add +1 in unite() column indexes