Karan Tongay Karan Tongay - 1 year ago 54
Java Question

How to customize Collections.sort() to sort strings in order of their numerical prefix

I have a

containing below values:

[1h, 0h, 2h, 10h, 4h, 9h, 3h, 7h, 8h, 6h, 5h]

When I sort it using
, the order becomes:

[0h, 10h, 1h, 2h, 3h, 4h, 5h, 6h, 7h, 8h, 9h]

Instead, it should be:

[0h, 1h, 2h, 3h, 4h, 5h, 6h, 7h, 8h, 9h, 10h]

How can I achieve this?

Answer Source

The output you quoted in your question is normal. If you look at the API description of Collection.sort you'll see:

Sorts the specified list into ascending order, according to the natural ordering of its elements. All elements in the list must implement the Comparable interface. Furthermore, all elements in the list must be mutually comparable (that is, e1.compareTo(e2) must not throw a ClassCastException for any elements e1 and e2 in the list).

By natural ordering they mean the ordering specified by their compareTo-method (the interface Comparable only has the method compareTo) which tells wether the this-object is is greater (return strict positive integer), or are equal (return 0), less(return strict negative integer) than the object passed as an argument in the ordering. To know if an object a should get before object b, it just calls a.compareTo(b).

  • If it's result is strict positive (> 0) the sort method will set a after b in the sorted array.
  • If it's result is strict negative (< 0) the sort method will set a before b in the sorted array.
  • If it's result is 0 the sort method won't change the order they are in.

I would refer to the API of the compareTo method in the String class, it is explained in detail there how strings are ordered and what the method returns.
What it does is just iterating character per character over each string and compare the characters at the same position untill a difference is found. If at that difference, the character of the this-string has a higher Unicode value, it will return a strict positive integer (the this stringer is greater) else it will return a strict negative integer (the this-string is lower than the argument). If no difference can be found it will return 0 (the this-string and the argument have same "order value").
Now the problem in your case(and for many people who order strings for the first time) is that integers can't be ordered with that Unicode-ordering because it goes character per character. If it has to sort String s1 = "19" and String s2 = "1119" it will call s1.compareTo(s2). The first difference is at the second character s1 has 9 and s2 has 1 at that position. Since 1 comes before 9 in the Unicode-table (1 has value hexadecimal value 31 and 9 has hexadecimal value 39), s1.compareTo(s2) will return a strict positive integer (see the link to the API to see which value). Thus it will put s2 before s1 in the sorted array even if 1119 > 19!

So now you know why it sorts like that, let's see what solutions we have:

  1. Map your array-elements to the decimal value of their begin-part using the Integer.parseInt method, sort these decimal values and map the resulting sorted arrays back to your original elements. I wouldn't really recommand this solution because of the extra collections and the extra code due to the mapping.
  2. Create your own Comparator-object, specify in there how you want your strings to be ordered (for example using the Integer.parseInt method) and pass it as argument to the Collections.sort method. By passing it as an argument, you tell the sort methode to use your Comparator-object instead of the classic compareTo methode. This link shows how a Comparator is used to sort an array. It's quite the same as using it to sort an arraylist except for the fact that Collections.sort won't change the arraylist and give you another sorted arraylist but Arrays.sort will change the order in the array you gave as argument (you'll notice it in the tutorial).

Hope it helps!

Good luck