zeo2396 zeo2396 - 11 days ago 5
Python Question

How to find value that occurs more than N/3 times in linear time and constant space

I'm going over the following coding problem:


You’re given a read only array of n integers. Find out if any integer
occurs more than n/3 times in the array in linear time and constant
additional space.

If so, return the integer. If not, return -1.

If there are multiple solutions, return any one.

Example :

Input : [1 2 3 1 1]

Output : 1

1 occurs 3 times which is more than 5/3 times.


I was able to solve it with this Python code:

class Solution:
# @param A : tuple of integers
# @return an integer
def repeatedNumber(self, A):
tracking = dict()

for i in A:
if i in tracking:
tracking[i] += 1
else:
tracking[i] = 1
#print tracking
results = []
for k in tracking:
if tracking[k] > len(A)/3:
results.append(k)
#print results
return -1 if len(results) < 1 else results[0]


The solution passes the test cases, but does it meet the requirements? And if not, how can it be made to meet them?

Answer

Your algorithm is correct, but uses more than O(1) space, so it does not fit with the requirements.

This problem is a variation on the majority vote problem, where the occurrence factor must be more than n/2 instead of n/3.

For the n/2 case the Boyer–Moore majority vote algorithm can be used, so I have been looking into finding a variation of this algorithm that would work for the n/3 case.

The counter which is used in Boyer-Moore would have to remain positive for a longer time to ensure a frequent value "survives" long enough to be recognised as a solution. This can be achieved by incrementing the counter with 2 when the value matches the currently retained value, and decrementing it by 1 (like in Boyer-More) when it differs from it.

Although that looks promising, it will fail identifying candidates that occur while the counter is decrementing, since the number of decrements can be 2n/3. The solution I propose to prevent this from happening is to use several counters, which work almost independently: they each track a different value. When one of them drops to zero, and the current value is not one of the other values being tracked, the zero-counter is bound to this new value, and starts incrementing again (with 2 at a time).

I found that with 4 such parallel counters -- and corresponding tracked values -- the algorithm never misses a good candidate.

When the end of the array is reached, the algorithm will have 4 candidates (if the array has at least 4 elements of course). The second sweep which is done in Boyer-More, is then also performed here, but by counting the occurrences of each of the 4 candidates. If any of those values occurs more than n/3 times, the solution has been found. If not, it is concluded there is no such value in the array.

Compared with the Boyer-More algorithm, this variation of it uses 4 times more time and 4 times more space, but that does not change the complexity: it still spends O(n) time and uses O(1) space.

Here is the code:

def commonNumber(a):
    counts = [0, 0, 0, 0]
    values = [None, None, None, None]
    for value in a:
        is_not_tracked = value not in values
        for i, count in enumerate(counts):
            if count == 0 and is_not_tracked:
                values[i] = value
            counts[i] += 2 if value == values[i] else -1 if count else 0
    # second phase: count the occurrences for each of the 4 candidates
    for value in values:
        if a.count(value) > len(a) // 3:
            return value
    return None

I tested this code with many arrays having a solution. Such arrays can be built by having -1 occurring just the minimum number of times needed to be considered a solution, then -2 occurring one time less, the value -3 once, and then a list of non-negative random values to fill up the remainder. That remainder list has fewer elements than needed for a value to be a solution, so such arrays always have -1 as solution.

Here is one of the code blocks I have tested with:

import random
size = 100
repeat = size//3 +1
a = [-1]*repeat + [-2]*(repeat-1) + [-3] + [int(i*random.random()) for i in range(size-2*repeat)]
for i in range(1000):
    random.shuffle(a)
    result = commonNumber(a)
    # Verify that the outcome is as expected: it should be value -1
    if result != -1: 
        print('fail:', a, result)
print('done')