DhiwaTdG DhiwaTdG - 1 month ago 9
C Question

Algorithm for intersection of n arrays in C

I need to write a function which returns the intersection

(AND condition)
of all the arrays generated in every iteration for an array of queries.

If my query is given by:
query[] = {"num>20", "avg==5", "deviation != 0.5"}
then,
n
runs from
0
to
length of query
. The query is passed on to a function (
get_sample_ids
) which compares the condition against a list of samples possessing certain information. The returned array numbers from
get_sample_ids
are the index of the respective samples.

query[] = {"num>20", "avg==5", "deviation != 0.5"}
int intersected_array*;
for n=0:query.length-1
int arr* = get_sample_ids(query[n]);
// n=0: [1, 7, 4, 2, 6]
// n=1: [3, 6, 2]
// n=2: [6, 2]
end;

Expected output: intersected_array* = [6, 2]


I've coded an implementation which has 2 arrays (
arr*
,
temp*
). For every array returned in the iteration, it is first stored in the
temp*
array and the intersection of
arr*
and
temp*
is stored in
arr*
. Is this an optimal solution or what is the best approach?

Answer

This is quite efficient but could be tiresome to implement (haven't tried it).

  1. Determine the shortest array. Benefit of using C is that if you don't know their length, you can use the pointers to arrays to determine it if they are placed sequentially in memory.
  2. Make a <entry,boolean> hash map for the entries in shortest. We know the size and if anything it's only going down in next steps.
  3. Iterate through an array. Start by initiating the whole map to false. For each entry check it in map.
  4. Iterate through map deleting all the entries that weren't checked. Set all the values to false.
  5. If there any new arrays left go back to step 3. with a new array.
  6. The result are the keys in the final map.

It looks like much but we didn't have to resort to any high complexity measures. Key to good performance is using hash map because of constant access time.

Alternatively:

Make the map be <entry,int>. This way you can count up all the recurrences and don't have to reset it at every iteration, which adds to complexity.

At the end just compare the number of array's to the the values in map. Those that match are your solution.

Complexity:

Seems like O(n).

Comments