chinsoon12 - 10 months ago 65

R Question

I am trying to sort rows of

`data.table`

`setorder`

`base::sort`

`dt <- data.table(A=c("AA","AB","Ab"))`

setorder(dt, A)

identical(dt[,A], sort(dt[["A"]]))

#[1] FALSE

df <- data.frame(A=c("AA","AB","Ab"))

identical(df[order(df$A),"A"], sort(df[["A"]]))

#[1] TRUE

Answer Source

We can reproduce this with `sort`

if we set the method to "radix" which was adopted in base R from data.table's sorting:

```
sort(dt[["A"]])
#[1] "AA" "Ab" "AB"
sort(dt[["A"]], method = "radix")
#[1] "AA" "AB" "Ab"
```

In `help("sort")`

we find:

Except for method "radix", the sort order for character vectors will depend on the collating sequence of the locale in use: see Comparison.

...However, there are some caveats with the radix sort: If x is a character vector, all elements must share the same encoding. Only UTF-8 (including ASCII) and Latin-1 encodings are supported. Collation always follows the "C" locale.

```
Sys.setlocale(category = "LC_ALL", locale = "C")
sort(dt[["A"]])
#[1] "AA" "AB" "Ab"
```