yahh yahh - 6 months ago 43
Java Question

codility absolute distinct count from an array

so i took the codility interview test yesterday and was informed today that i failed, unfortunately i wasnt given any other information by either codility nor the employer as to where i screwed up so i would appreciate some help in knowing where i went wrong. i know codility pays alot of emphasis on how fast the program runs and how it behaves for large numbers. now i didnt copy paste the questions so this the approx of what i remember


  1. count the number of elements in an array a which are absolute distinct, what it means is if an array had -3 and 3 in it these numbers are not distinct because|-3|=|3|. i think an example would clear it up better



a={-5,-3,0,1,-3}
the result would be 4 because there are 4 absolute distinct elements in this array.

The question also stated that a.length would be <=10000, and most importantly it stated that assume that the array is sorted in ascending order but i didnt really understand why we would need it to be sorted

IF YOU THINK I MISSED SOMETHING ASK AND I WILL TRY TO CLEAR UP THE QUESTION FURTHER.

here is my code

import java.util.HashMap;
import java.util.HashSet;
import java.util.Set;


public class test2 {

int test(int[] a){
Set<Integer> s=new HashSet<Integer>();

for(int i=0;i<a.length;i++){
s.add(Math.abs(a[i]));

}
return s.size();

}

public static void main(String[] args) {
test2 t=new test2();
int[] a={1,1,1,2,-1};
System.out.println(t.test(a));

}

}

Answer

If the array is sorted you can find duplicates by looking a neightbours. To compare absolute values to need to start at both the start and the end. This avoid creating a new structure.

EDIT: IMHO HashMap/HashSet is O(log(log(n)) due to collisions, it is only O(1) if there is a perfect hash function. I would have thought not creating object which be much much faster but appears to be only 4x fast on my machine.

In summary, you can see that using a Set is simpler, clearer and easier to maintain. It is still very fast and would be the best solution in 98% of cases.

public static void main(String[] args) throws Exception {
    for (int len : new int[]{100 * 1000 * 1000, 10 * 1000 * 1000, 1000 * 1000, 100 * 1000, 10 * 1000, 1000}) {
        int[] nums = new int[len];
        for (int i = 0; i < len; i++)
            nums[i] = (int) (Math.random() * (Math.random() * 2001 - 1000));
        Arrays.sort(nums);

        long timeArray = 0;
        long timeSet = 0;
        int runs = len > 1000 * 1000 ? 10 : len >= 100 * 1000 ? 100 : 1000;
        for (int i = 0; i < runs; i++) {
            long time1 = System.nanoTime();
            int count = countDistinct(nums);
            long time2 = System.nanoTime();
            int count2 = countDistinctUsingSet(nums);
            long time3 = System.nanoTime();
            timeArray += time2 - time1;
            timeSet += time3 - time2;
            assert count == count2;
        }
        System.out.printf("For %,d numbers, using an array took %,d us on average, using a Set took %,d us on average, ratio=%.1f%n",
                len, timeArray / 1000 / runs, timeSet / 1000 / runs, 1.0 * timeSet / timeArray);
    }
}

private static int countDistinct(int[] nums) {
    int lastLeft = Math.abs(nums[0]);
    int lastRight = Math.abs(nums[nums.length - 1]);
    int count = 0;
    for (int a = 1, b = nums.length - 2; a <= b;) {
        int left = Math.abs(nums[a]);
        int right = Math.abs(nums[b]);
        if (left == lastLeft) {
            a++;
            lastLeft = left;
        } else if (right == lastRight) {
            b--;
            lastRight = right;
        } else if (lastLeft == lastRight) {
            a++;
            b--;
            lastLeft = left;
            lastRight = right;
            count++;
        } else if (lastLeft > lastRight) {
            count++;
            a++;
            lastLeft = left;
        } else {
            count++;
            b--;
            lastRight = right;
        }
    }
    count += (lastLeft == lastRight ? 1 : 2);
    return count;
}

private static int countDistinctUsingSet(int[] nums) {
    Set<Integer> s = new HashSet<Integer>();
    for (int n : nums)
        s.add(Math.abs(n));
    int count = s.size();
    return count;
}

prints

For 100,000,000 numbers, using an array took 279,623 us on average, using a Set took 1,270,029 us on average, ratio=4.5

For 10,000,000 numbers, using an array took 28,525 us on average, using a Set took 126,591 us on average, ratio=4.4

For 1,000,000 numbers, using an array took 2,846 us on average, using a Set took 12,131 us on average, ratio=4.3

For 100,000 numbers, using an array took 297 us on average, using a Set took 1,239 us on average, ratio=4.2

For 10,000 numbers, using an array took 42 us on average, using a Set took 156 us on average, ratio=3.7

For 1,000 numbers, using an array took 8 us on average, using a Set took 30 us on average, ratio=3.6


On @Kevin K's point, even Integer can have collision even through it's hash values are unique, it can map to the same bucket as the capacity is limited.

public static int hash(int h) {
    // This function ensures that hashCodes that differ only by
    // constant multiples at each bit position have a bounded
    // number of collisions (approximately 8 at default load factor).
    h ^= (h >>> 20) ^ (h >>> 12);
    return h ^ (h >>> 7) ^ (h >>> 4);
}

public static void main(String[] args) throws Exception {
    Map<Integer, Integer> map = new HashMap<Integer, Integer>(32, 2.0f);
    for (int i = 0; i < 10000 && map.size() < 32 * 2; i++) {
        if (hash(i) % 32 == 0)
            map.put(i, i);
    }
    System.out.println(map.keySet());
}

prints

[2032, 2002, 1972, 1942, 1913, 1883, 1853, 1823, 1763, 1729, 1703, 1669, 1642, 1608, 1582, 1548, 1524, 1494, 1456, 1426, 1405, 1375, 1337, 1307, 1255, 1221, 1187, 1153, 1134, 1100, 1066, 1032, 1016, 986, 956, 926, 881, 851, 821, 791, 747, 713, 687, 653, 610, 576, 550, 516, 508, 478, 440, 410, 373, 343, 305, 275, 239, 205, 171, 137, 102, 68, 34, 0]

The values are in reverse order because the HashMap has generated into a LinkedList.