1'' - 3 months ago 32

C++ Question

I'm porting C++ code which uses

`std::nth_element`

`std::partition`

`nth_element`

`nth_element`

Canonically,

`nth_element`

`nth_element`

`partition`

`nth_element`

`while (first!=last) {`

while (pred(*first)) {

++first;

if (first==last) return first;

}

do {

--last;

if (first==last) return first;

} while (!pred(*last));

swap (*first,*last);

++first;

}

return first;

where pred is a function that evaluates whether an element should be in the first bucket. Basically, this function iteratively finds the outermost pair of elements of the array which are in the wrong place, and swaps them, stopping when the pair of elements are the same element.

Here are my initial ideas on parallelizing

`nth_element`

`partition`

Partition could be implemented using atomic comparisons and swaps, but I'm not sure how to cover all the possible pairs of values which could be swapped. There's no obvious way to divide the work among multiple threads, since partition requires comparing elements which may be either right next to each other or at opposite ends of the array. I also don't see a way to avoid having thread B compare with an element which has already been swapped by thread A, which is inefficient.

nth_element seems even less parallelizable, since it is recursive: each partition depends upon the elements having been partially ordered by the previous partition.

Presumably, an efficient parallelization strategy will require a completely different approach than the typical serial code, for both functions.

Do efficient parallel implementations of

`nth_element`

`partition`

Answer

Cuda THRUST has partition function implemented (http://docs.nvidia.com/cuda/thrust/index.html#reordering).

The main idea should be following: Using prefix sums to calculate position of element in the array and then rearrange the array.