Matias Andina Matias Andina - 28 days ago 10
R Question

Sort data frame column by factor

Supose I have a data frame with 3 columns (name, y, sex) where "name" is character, "y" is a numeric value and "sex" is a factor.

sex<-c("M","M","F","M","F","M","M","M","F")
x<-c("MARK","TOM","SUSAN","LARRY","EMMA","LEONARD","TIM","MATT","VIOLET")
name<-as.character(x)
y<-rnorm(9,8,1)
score<-data.frame(x,y,sex)
score
name y sex
1 MARK 6.767086 M
2 TOM 7.613928 M
3 SUSAN 7.447405 F
4 LARRY 8.040069 M
5 EMMA 8.306875 F
6 LEONARD 8.697268 M
7 TIM 10.385221 M
8 MATT 7.497702 M
9 VIOLET 10.177969 F


If I wanted to order it by y I would use

score[order(score$y),]
x y sex
1 MARK 6.767086 M
3 SUSAN 7.447405 F
8 MATT 7.497702 M
2 TOM 7.613928 M
4 LARRY 8.040069 M
5 EMMA 8.306875 F
6 LEONARD 8.697268 M
9 VIOLET 10.177969 F
7 TIM 10.385221 M


So far, so good...The names keep the correct score BUT how could I reorder it to have M and F levels not mixed. I need to order and at the same time keep factor levels separated.

Finally I would like to take a step further to involve character, the example doesn't help, but what if there were tied "y" values and I would have to order again within factor (e.g, TIM and TOM got 8.4 and I have to assign alphabetical order)

I was thinking about by function but it creates a list and doesn't help really. I think there must be some function like it to apply on data frames and get data frames as return

Thank you

TO MAKE CLEAR THE POINT

sep<-split(score,score$sex)
sep$M<-sep$M[order(sep$M[,2]),]
sep$M
x y sex
1 MARK 6.767086 M
8 MATT 7.497702 M
2 TOM 7.613928 M
4 LARRY 8.040069 M
6 LEONARD 8.697268 M
7 TIM 10.385221 M

sep$F<-sep$F[order(sep$F[,2]),]
sep$F
x y sex
3 SUSAN 7.447405 F
5 EMMA 8.306875 F
9 VIOLET 10.177969 F

merged<-rbind(sep$M,sep$F)
merged
x y sex
1 MARK 6.767086 M
8 MATT 7.497702 M
2 TOM 7.613928 M
4 LARRY 8.040069 M
6 LEONARD 8.697268 M
7 TIM 10.385221 M
3 SUSAN 7.447405 F
5 EMMA 8.306875 F
9 VIOLET 10.177969 F


I know how to do that if I have 2 or 3 factors. But what if I had serious levels of factors, say 20, should I write a for loop?

Answer

order takes multiple arguments, and it does just what you want:

with(score, score[order(sex, y, x),])
##         x        y sex
## 3   SUSAN 6.636370   F
## 5    EMMA 6.873445   F
## 9  VIOLET 8.539329   F
## 6 LEONARD 6.082038   M
## 2     TOM 7.812380   M
## 8    MATT 8.248374   M
## 4   LARRY 8.424665   M
## 7     TIM 8.754023   M
## 1    MARK 8.956372   M