Kevin Li - 1 year ago 70
Python Question

# why is this o(n) three-way set disjointness algorithm slower than then o(n^3) version?

O(n) because converting list to set is O(n) time, getting intersection is O(n) time and len is O(n)

``````def disjoint3c(A, B, C):
"""Return True if there is no element common to all three lists."""
return len(set(A) & set(B) & set(C)) == 0
``````

or similarly, should be clearly O(N)

``````def set_disjoint_medium (a, b, c):
a, b, c = set(a), set(b), set(c)
for elem in a:
if elem in b and elem in c:
return False
return True
``````

yet this O(n^3) code:

``````def set_disjoint_slowest (a, b, c):
for e1 in a:
for e2 in b:
for e3 in c:
if e1 == e2 == e3:
return False
return True
``````

runs faster

see time where algorithm one is the n^3, and algorithm three is the O(n) set code... algorithm two is actually n^2 where we optimize algorithm one by checking for disjointness before the third loop starts

``````Size Input (n):  10000

Algorithm One: 0.014993906021118164

Algorithm Two: 0.013481855392456055

Algorithm Three: 0.01955580711364746

Size Input (n):  100000

Algorithm One: 0.15916991233825684

Algorithm Two: 0.1279449462890625

Algorithm Three: 0.18677806854248047

Size Input (n):  1000000

Algorithm One: 1.581618070602417

Algorithm Two: 1.146049976348877

Algorithm Three: 1.8179030418395996
``````

Here is the setup I used for testing the speed of the code.

``````import random

# Collapsed these because already known
def disjoint3c(A, B, C):
def set_disjoint_medium (a, b, c):
def set_disjoint_slowest (a, b, c):

a = [random.randrange(100) for i in xrange(10000)]
b = [random.randrange(100) for i in xrange(10000)]
c = [random.randrange(100) for i in xrange(10000)]

# Ran timeit.
# Results with timeit module.
1-) 0.00635750419422
2-) 0.0061145967287
3-) 0.0487953200969
``````

Now to the results, as you see, the `O(n^3)` solution runs 8 times slower than the other solutions. But this is still fast for such an algorithm(Even faster in your test). Why this happens ?

Because medium and slowest solutions you used, finishes the execution of the code as soon as a common element is detected. So the full complexity of the code is not realized. It breaks as soon as it finds an answer. Why the slowest solution ran almost as fast as the other ones in your test ? Probably because it finds the answer closer to the beginning of the lists.

To test this, you could create the lists like this. Try this yourself.

``````a = range(1000)
b = range(1000, 2000)
c = range(2000, 3000)
``````

Now the real difference between the times will be obvious because the slowest solution will have to run until it finishes all iterations, because there is no common element.

So it is a situation of Worst case and Best case performance.

Not a part of the question edit: So, what if you want to retain the speed of finding early common occurances, but also don't want to increase complexity. I made a rough solution for that, maybe more experienced users can suggest faster code.

``````def mysol(a, b, c):
store = [set(), set(), set()]

# zip_longest for Python3, not izip_longest.
for i, j, k in itertools.izip_longest(a, b, c):