chinsoon12 chinsoon12 - 1 month ago 16
R Question

Comparing data.table::setorder against base::sort

I am trying to sort rows of

data.table
(R-3.3.1 Win x64 & data.table_1.9.6) and found that the
setorder
is different from the
base::sort
. Am I using setorder correctly?

dt <- data.table(A=c("AA","AB","Ab"))
setorder(dt, A)
identical(dt[,A], sort(dt[["A"]]))
#[1] FALSE

df <- data.frame(A=c("AA","AB","Ab"))
identical(df[order(df$A),"A"], sort(df[["A"]]))
#[1] TRUE

Answer

We can reproduce this with sort if we set the method to "radix" which was adopted in base R from data.table's sorting:

sort(dt[["A"]])
#[1] "AA" "Ab" "AB"
sort(dt[["A"]], method = "radix")
#[1] "AA" "AB" "Ab"

In help("sort") we find:

Except for method "radix", the sort order for character vectors will depend on the collating sequence of the locale in use: see Comparison.
...

However, there are some caveats with the radix sort: If x is a character vector, all elements must share the same encoding. Only UTF-8 (including ASCII) and Latin-1 encodings are supported. Collation always follows the "C" locale.

Sys.setlocale(category = "LC_ALL", locale = "C")
sort(dt[["A"]])
#[1] "AA" "AB" "Ab"