Iskar Jarak Iskar Jarak - 1 month ago 7
C Question

Efficient algorithm to produce the n-way intersection of sorted arrays in C

I need to produce the intersection between some sorted arrays of integers in C. I know how to find the intersection between two sorted arrays, but I need to do this for more than two arrays, efficiently and without prior knowledge of the number of arrays. I can impose a sensible limit on the maximum number - let's say ten for now. These arrays could be anywhere from a few items to a couple of hundred thousand items long, and are by no means necessarily the same length.

Pseudo-code for producing the intersection of two sorted arrays:

while i < m and j < n do:
if array1[i] < array2[j]:
increment i
else if array1[i] > array2[j]:
increment j
else
add array1[i] to intersection(array1, array2)
increment i
increment j


I am working with C, and I am after a clear explanation rather than code.

Answer

I assume that all your arrays are sorted. Lets assume we have arrays A_1 to A_n. Have a counter for each array (thus, we have n counters i_1 to i_n, just like you did it for two arrays).

Now we introduce a minimum-heap, that contains the whole arrays in a manner such that the minimum array is that array with the currently lowest number pointed to by the corresponding pointer. This means, we can at each moment, retrieve the array with the currently lowest number pointed to.

Now, we extract the minimum array from the heap and remember it. We go on extracting the minimum array as long as the number pointed to stays the same. If we extract all arrays (i.e. if all arrays have the same currently lowest pointed to number), we know that this number is in the intersection. If not (i.e. if not all arrays do contain the same currently lowest pointed to number), we know that the number we are currently examining can not be in the intersection. Thus, we increment all counters to the arrays already extracted and put them back into the heap.

We do this until we find one array's pointer reaching the array's end. I'm sorry for the undetailed description, but I do not have enough time to work it out in more detail.

Edit.

If you have one array with very few elements, it might be useful to just binary-search the other arrays for these numbers or checking these numbers using a hash table.