I'm going over the following coding problem:
You’re given a read only array of n integers. Find out if any integer
occurs more than n/3 times in the array in linear time and constant
If so, return the integer. If not, return -1.
If there are multiple solutions, return any one.
Input : [1 2 3 1 1]
Output : 1
1 occurs 3 times which is more than 5/3 times.
# @param A : tuple of integers
# @return an integer
def repeatedNumber(self, A):
tracking = dict()
for i in A:
if i in tracking:
tracking[i] += 1
tracking[i] = 1
results = 
for k in tracking:
if tracking[k] > len(A)/3:
return -1 if len(results) < 1 else results
Your algorithm is correct, but uses more than O(1) space, so it does not fit with the requirements.
This problem is a variation on the majority vote problem, where the occurrence factor must be more than n/2 instead of n/3.
For the n/2 case the Boyer–Moore majority vote algorithm can be used, so I have been looking into finding a variation of this algorithm that would work for the n/3 case.
The counter which is used in Boyer-Moore would have to remain positive for a longer time to ensure a frequent value "survives" long enough to be recognised as a solution. This can be achieved by incrementing the counter with 2 when the value matches the currently retained value, and decrementing it by 1 (like in Boyer-More) when it differs from it.
Although that looks promising, it will fail identifying candidates that occur while the counter is decrementing, since the number of decrements can be 2n/3. The solution I propose to prevent this from happening is to use several counters, which work almost independently: they each track a different value. When one of them drops to zero, and the current value is not one of the other values being tracked, the zero-counter is bound to this new value, and starts incrementing again (with 2 at a time).
I found that with 4 such parallel counters -- and corresponding tracked values -- the algorithm never misses a good candidate.
When the end of the array is reached, the algorithm will have 4 candidates (if the array has at least 4 elements of course). The second sweep which is done in Boyer-More, is then also performed here, but by counting the occurrences of each of the 4 candidates. If any of those values occurs more than n/3 times, the solution has been found. If not, it is concluded there is no such value in the array.
Compared with the Boyer-More algorithm, this variation of it uses 4 times more time and 4 times more space, but that does not change the complexity: it still spends O(n) time and uses O(1) space.
Here is the code:
def commonNumber(a): counts = [0, 0, 0, 0] values = [None, None, None, None] for value in a: is_not_tracked = value not in values for i, count in enumerate(counts): if count == 0 and is_not_tracked: values[i] = value counts[i] += 2 if value == values[i] else -1 if count else 0 # second phase: count the occurrences for each of the 4 candidates for value in values: if a.count(value) > len(a) // 3: return value return None
I tested this code with many arrays having a solution. Such arrays can be built by having -1 occurring just the minimum number of times needed to be considered a solution, then -2 occurring one time less, the value -3 once, and then a list of non-negative random values to fill up the remainder. That remainder list has fewer elements than needed for a value to be a solution, so such arrays always have -1 as solution.
Here is one of the code blocks I have tested with:
import random size = 100 repeat = size//3 +1 a = [-1]*repeat + [-2]*(repeat-1) + [-3] + [int(i*random.random()) for i in range(size-2*repeat)] for i in range(1000): random.shuffle(a) result = commonNumber(a) # Verify that the outcome is as expected: it should be value -1 if result != -1: print('fail:', a, result) print('done')