ApathyBear ApathyBear - 2 months ago 7
Java Question

Understanding why Java selection rank returns max() as final result

I am working out a solution to the following question:


Describe an algorithm to find the smallest one million numbers in one
billion numbers. Assume that the computer memory can hold all one
billion numbers.


The book gives the a selection rank solution but I am having a hard time understanding a few parts of it:

public static int partition(int[] array, int left, int right, int pivot) {
while (true) {
while (left <= right && array[left] <= pivot) {
left++;
}

while (left <= right && array[right] > pivot) {
right--;
}

if (left > right) {
return left - 1;
}
swap(array, left, right);
}
}

public static int rank(int[] array, int left, int right, int rank) {
int pivot = array[randomIntInRange(left, right)];
int leftEnd = partition(array, left, right, pivot); // returns end of left partition
int leftSize = leftEnd - left + 1;
if (leftSize == rank + 1) {
return max(array, left, leftEnd);
} else if (rank < leftSize) {
return rank(array, left, leftEnd, rank);
} else {
return rank(array, leftEnd + 1, right, rank - leftSize);
}
}


I understand most of it, but I do no understand the following two lines above:

if (leftSize == rank + 1) {
return max(array, left, leftEnd);


1. Why are we returning the max of the three variables?

2. Shouldn't we just be returning
array[left:leftEnd]
or something of that nature?

Answer

Congratulations on trying to learn something by carefully studying a book. It's a key skill that seems to be getting rarer.

It makes general sense if the definition of the return value of rank is "there exist exactly a million numbers less than or equal to rank." The definition of max would be something like:

int t = array[left];
for (int i = left + 1; i <= leftEnd; i++) 
  t = Math.max(t, array[i]);
return t;

Returning the max value is beyond the problem statement and kind of weird. It would be better and simpler just to partition the elements so that the max million are at the top: array[0] through array[999999]. Then find the max only if that's actually needed

Note that because rank is tail recursive, there's a simple iterative version of the same code that I think would be clearer.

I'm also not convinced this code is correct. leftSize == rank in the check makes more sense than leftSize == rank + 1. But without more definitions and calling code, it's hard to say for sure.

Comments