yahh - 6 months ago 43

Java Question

so i took the codility interview test yesterday and was informed today that i failed, unfortunately i wasnt given any other information by either codility nor the employer as to where i screwed up so i would appreciate some help in knowing where i went wrong. i know codility pays alot of emphasis on how fast the program runs and how it behaves for large numbers. now i didnt copy paste the questions so this the approx of what i remember

- count the number of elements in an array a which are absolute distinct, what it means is if an array had -3 and 3 in it these numbers are not distinct because|-3|=|3|. i think an example would clear it up better

a={-5,-3,0,1,-3}

the result would be 4 because there are 4 absolute distinct elements in this array.

The question also stated that a.length would be <=10000, and most importantly it stated that assume that the

`import java.util.HashMap;`

import java.util.HashSet;

import java.util.Set;

public class test2 {

int test(int[] a){

Set<Integer> s=new HashSet<Integer>();

for(int i=0;i<a.length;i++){

s.add(Math.abs(a[i]));

}

return s.size();

}

public static void main(String[] args) {

test2 t=new test2();

int[] a={1,1,1,2,-1};

System.out.println(t.test(a));

}

}

Answer

If the array is sorted you can find duplicates by looking a neightbours. To compare absolute values to need to start at both the start and the end. This avoid creating a new structure.

EDIT: IMHO HashMap/HashSet is O(log(log(n)) due to collisions, it is only O(1) if there is a perfect hash function. I would have thought not creating object which be much much faster but appears to be only 4x fast on my machine.

In summary, you can see that using a Set is simpler, clearer and easier to maintain. It is still very fast and would be the best solution in 98% of cases.

```
public static void main(String[] args) throws Exception {
for (int len : new int[]{100 * 1000 * 1000, 10 * 1000 * 1000, 1000 * 1000, 100 * 1000, 10 * 1000, 1000}) {
int[] nums = new int[len];
for (int i = 0; i < len; i++)
nums[i] = (int) (Math.random() * (Math.random() * 2001 - 1000));
Arrays.sort(nums);
long timeArray = 0;
long timeSet = 0;
int runs = len > 1000 * 1000 ? 10 : len >= 100 * 1000 ? 100 : 1000;
for (int i = 0; i < runs; i++) {
long time1 = System.nanoTime();
int count = countDistinct(nums);
long time2 = System.nanoTime();
int count2 = countDistinctUsingSet(nums);
long time3 = System.nanoTime();
timeArray += time2 - time1;
timeSet += time3 - time2;
assert count == count2;
}
System.out.printf("For %,d numbers, using an array took %,d us on average, using a Set took %,d us on average, ratio=%.1f%n",
len, timeArray / 1000 / runs, timeSet / 1000 / runs, 1.0 * timeSet / timeArray);
}
}
private static int countDistinct(int[] nums) {
int lastLeft = Math.abs(nums[0]);
int lastRight = Math.abs(nums[nums.length - 1]);
int count = 0;
for (int a = 1, b = nums.length - 2; a <= b;) {
int left = Math.abs(nums[a]);
int right = Math.abs(nums[b]);
if (left == lastLeft) {
a++;
lastLeft = left;
} else if (right == lastRight) {
b--;
lastRight = right;
} else if (lastLeft == lastRight) {
a++;
b--;
lastLeft = left;
lastRight = right;
count++;
} else if (lastLeft > lastRight) {
count++;
a++;
lastLeft = left;
} else {
count++;
b--;
lastRight = right;
}
}
count += (lastLeft == lastRight ? 1 : 2);
return count;
}
private static int countDistinctUsingSet(int[] nums) {
Set<Integer> s = new HashSet<Integer>();
for (int n : nums)
s.add(Math.abs(n));
int count = s.size();
return count;
}
```

prints

For 100,000,000 numbers, using an array took 279,623 us on average, using a Set took 1,270,029 us on average, ratio=4.5

For 10,000,000 numbers, using an array took 28,525 us on average, using a Set took 126,591 us on average, ratio=4.4

For 1,000,000 numbers, using an array took 2,846 us on average, using a Set took 12,131 us on average, ratio=4.3

For 100,000 numbers, using an array took 297 us on average, using a Set took 1,239 us on average, ratio=4.2

For 10,000 numbers, using an array took 42 us on average, using a Set took 156 us on average, ratio=3.7

For 1,000 numbers, using an array took 8 us on average, using a Set took 30 us on average, ratio=3.6

On @Kevin K's point, even Integer can have collision even through it's hash values are unique, it can map to the same bucket as the capacity is limited.

```
public static int hash(int h) {
// This function ensures that hashCodes that differ only by
// constant multiples at each bit position have a bounded
// number of collisions (approximately 8 at default load factor).
h ^= (h >>> 20) ^ (h >>> 12);
return h ^ (h >>> 7) ^ (h >>> 4);
}
public static void main(String[] args) throws Exception {
Map<Integer, Integer> map = new HashMap<Integer, Integer>(32, 2.0f);
for (int i = 0; i < 10000 && map.size() < 32 * 2; i++) {
if (hash(i) % 32 == 0)
map.put(i, i);
}
System.out.println(map.keySet());
}
```

prints

[2032, 2002, 1972, 1942, 1913, 1883, 1853, 1823, 1763, 1729, 1703, 1669, 1642, 1608, 1582, 1548, 1524, 1494, 1456, 1426, 1405, 1375, 1337, 1307, 1255, 1221, 1187, 1153, 1134, 1100, 1066, 1032, 1016, 986, 956, 926, 881, 851, 821, 791, 747, 713, 687, 653, 610, 576, 550, 516, 508, 478, 440, 410, 373, 343, 305, 275, 239, 205, 171, 137, 102, 68, 34, 0]

The values are in reverse order because the HashMap has generated into a LinkedList.