Lauren Fitch Lauren Fitch - 3 months ago 6
R Question

sort() produces different results in Ubuntu and Windows

I have a vector that is being sorted differently when I run the code on my Windows vs. Ubuntu remote server.

Windows:

> u <- getNodes(network)
> head(u)
[1] "-1336623650" "-1749477680" "539" "-1036241023" "6135" "-44987577"
> uid <- sort(u)
> head(uid)
[1] "-1000019199" "-1000022360" "-1000039153" "-1000044219" "-1000069199" "-1000099640"


Ubuntu:

> u <- getNodes(network)
> head(u)
[1] "-1336623650" "-1749477680" "539" "-1036241023" "6135"
[6] "-44987577"
> uid <- sort(u)
> head(uid)
[1] "10" "100" "1000" "10000" "-1000019199"
[6] "-1000022360"


Both implementations of R have the same packages loaded and are the same R version (3.3.1). Ubuntu is 13.10 and Windows is Windows 7.

Answer

String sorting (which is what you are doing) in R is based on the "locale" which is different for Windows and Linux systems. But, do be careful. No locale will sort these strings in correct numerical order, you would have to sort a vector of numbers if you wanted numerical order.

Grab the value of Sys.getlocale("LC_COLLATE") from each system and compare them. For my package, I do the below at the entry point, and report it in packageStartupMessage.

collateOrigValue<-Sys.getlocale("LC_COLLATE")
on.exit(Sys.setlocale("LC_COLLATE",collateOrigValue), add=TRUE)
Sys.setlocale("LC_COLLATE","C")

See also https://stat.ethz.ch/R-manual/R-devel/library/base/html/locales.html

Comments