Kevin Li Kevin Li - 1 year ago 70
Python Question

why is this o(n) three-way set disjointness algorithm slower than then o(n^3) version?

O(n) because converting list to set is O(n) time, getting intersection is O(n) time and len is O(n)

def disjoint3c(A, B, C):
"""Return True if there is no element common to all three lists."""
return len(set(A) & set(B) & set(C)) == 0

or similarly, should be clearly O(N)

def set_disjoint_medium (a, b, c):
a, b, c = set(a), set(b), set(c)
for elem in a:
if elem in b and elem in c:
return False
return True

yet this O(n^3) code:

def set_disjoint_slowest (a, b, c):
for e1 in a:
for e2 in b:
for e3 in c:
if e1 == e2 == e3:
return False
return True

runs faster

see time where algorithm one is the n^3, and algorithm three is the O(n) set code... algorithm two is actually n^2 where we optimize algorithm one by checking for disjointness before the third loop starts

Size Input (n): 10000

Algorithm One: 0.014993906021118164

Algorithm Two: 0.013481855392456055

Algorithm Three: 0.01955580711364746

Size Input (n): 100000

Algorithm One: 0.15916991233825684

Algorithm Two: 0.1279449462890625

Algorithm Three: 0.18677806854248047

Size Input (n): 1000000

Algorithm One: 1.581618070602417

Algorithm Two: 1.146049976348877

Algorithm Three: 1.8179030418395996

Answer Source

The comments made clarifications about the Big-Oh notations. So I will just start with testing the code.

Here is the setup I used for testing the speed of the code.

import random

# Collapsed these because already known
def disjoint3c(A, B, C):
def set_disjoint_medium (a, b, c):
def set_disjoint_slowest (a, b, c):

a = [random.randrange(100) for i in xrange(10000)]
b = [random.randrange(100) for i in xrange(10000)]
c = [random.randrange(100) for i in xrange(10000)]

# Ran timeit.
# Results with timeit module.
1-) 0.00635750419422
2-) 0.0061145967287
3-) 0.0487953200969

Now to the results, as you see, the O(n^3) solution runs 8 times slower than the other solutions. But this is still fast for such an algorithm(Even faster in your test). Why this happens ?

Because medium and slowest solutions you used, finishes the execution of the code as soon as a common element is detected. So the full complexity of the code is not realized. It breaks as soon as it finds an answer. Why the slowest solution ran almost as fast as the other ones in your test ? Probably because it finds the answer closer to the beginning of the lists.

To test this, you could create the lists like this. Try this yourself.

a = range(1000)
b = range(1000, 2000)
c = range(2000, 3000)

Now the real difference between the times will be obvious because the slowest solution will have to run until it finishes all iterations, because there is no common element.

So it is a situation of Worst case and Best case performance.

Not a part of the question edit: So, what if you want to retain the speed of finding early common occurances, but also don't want to increase complexity. I made a rough solution for that, maybe more experienced users can suggest faster code.

def mysol(a, b, c):
    store = [set(), set(), set()]

    # zip_longest for Python3, not izip_longest.
    for i, j, k in itertools.izip_longest(a, b, c):
        if i: store[0].add(i)
        if j: store[1].add(j)
        if k: store[2].add(k)

        if (i in store[1] and i in store[2]) or (j in store[0] and i in store[2]) or (k in store[0] and i in store[1]):
            return False
    return True

What is basically being done in this code is, you avoid converting all the lists to sets in the beginning. Rather, iterate through all lists at the same time, add elements to sets, check for common occurances. So now, you keep the speed of finding an early solution, but it is still slow for the worst case that I shown.

For the speeds, this runs 3-4 times slower than your first two solutions in the worst case. But runs 4-10 times faster than those solutions in randomized lists.

Note: The fact that you are finding all common elements in three lists( in the first solution) unquestionably means that there is a faster solution by theory. Because you just need to know if there is even a single common element, and that knowledge is enough.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download