Nell Nell - 3 months ago 12
R Question

Sort a data table with specific value order

I have a data table:

> DT <- data.table(Tag1 = c(22,253,6219,6219,252862,252864,312786,312812),
+ Tag2 = c(22,255,6220,252857,252863,252865,251191,252863),
+ Date= as.Date(as.character(c("7/25/2008","6/15/2000","6/30/2000","9/6/2002","9/6/2002","9/6/2002","9/3/2003","9/5/2003")),format = "%m/%d/%Y"))
> DT
Tag1 Tag2 Date
1: 22 22 2008-07-25
2: 253 255 2000-06-15
3: 6219 6220 2000-06-30
4: 6219 252857 2002-09-06
5: 252862 252863 2002-09-06
6: 252864 252865 2002-09-06
7: 312786 251191 2003-09-03
8: 312812 252863 2003-09-05


I would like to sort it in ascending order by 3 columns: Tag1, Tag2 and Date.
I tested:

> test <- DT[order(Tag1, Tag2, Date)]
> test
Tag1 Tag2 Date
1: 22 22 2008-07-25
2: 253 255 2000-06-15
3: 6219 6220 2000-06-30
4: 6219 252857 2002-09-06
5: 252862 252863 2002-09-06
6: 252864 252865 2002-09-06
7: 312786 251191 2003-09-03
8: 312812 252863 2003-09-05


However, I would like to sort the data table as follows:

> test
Tag1 Tag2 Date
1: 22 22 2008-07-25
2: 253 255 2000-06-15
3: 6219 6220 2000-06-30
4: 6219 252857 2002-09-06
5: 252862 252863 2002-09-06
6: 312812 252863 2003-09-05
7: 252864 252865 2002-09-06
8: 312786 251191 2003-09-03


In particular, the duplicated values for Tag1 or Tag1 should be disposed one below the other (for example: 6219 for Tag1 and 252863 for Tag2).
How can I do this ?

Answer

Old Order

df[order(Tag1, Tag2, Date)]
#      Tag1   Tag2       Date
# 1:     22     22 2008-07-25
# 2:    253    255 2000-06-15
# 3:   6219   6220 2000-06-30
# 4:   6219 252857 2002-09-06
# 5: 252862 252863 2002-09-06
# 6: 252864 252865 2002-09-06
# 7: 312786 251191 2003-09-03
# 8: 312812 252863 2003-09-05

New Order

setcolorder(dt1 <- df[order(Tag1, Tag2, Date)][, .(Tag1, Date), by = Tag2], c("Tag1", "Tag2", "Date"))
dt1
#     Tag1   Tag2       Date
# 1:     22     22 2008-07-25
# 2:    253    255 2000-06-15
# 3:   6219   6220 2000-06-30
# 4:   6219 252857 2002-09-06
# 5: 252862 252863 2002-09-06
# 6: 312812 252863 2003-09-05
# 7: 252864 252865 2002-09-06
# 8: 312786 251191 2003-09-03

Solution of @akrun in the comments disturbs the structure of the data. Here is the comparison. Look at #4: 6219 should have 252857 instead of 251191

df[,lapply(df, sort)]
#      Tag1   Tag2       Date
# 1:     22     22 2000-06-15
# 2:    253    255 2000-06-30
# 3:   6219   6220 2002-09-06
# 4:   6219 251191 2002-09-06
# 5: 252862 252857 2002-09-06
# 6: 252864 252863 2003-09-03
# 7: 312786 252863 2003-09-05
# 8: 312812 252865 2008-07-25
Comments