AK47 AK47 - 30 days ago 13
Python Question

Big-O of list comprehension that calls max in the condition: O(n) or O(n^2)?

Q. Write an algorithm that returns the second largest number in an array

a = [1, 2, 3, 4, 5]
print(max([x for x in a if x != max(a)]))
>> 4


I'm trying to figure out how this algorithm works and whether or not pythons internal magic will make this as efficient as writing a linear algorithm which just loops over the list
a
once and stores the highest and second highest values.

Correct me if I'm wrong:


  • The call to
    max(a)
    would be O(n)

  • [x for x in a]
    would also be O(n)



Would python be smart enough to cache the value of
max(a)
or would this mean that the list comprehension part the algorithm is O(n^2)?

And then the final
max([listcomp])
would be another O(n), but this would only run once after the comprehension is finished, so the final algorithm would be O(n^2)?

Is there any fancy business going on internally that would cache the
max(a)
value and result in this algorithm working out quicker than O(n^2)?

Answer Source

The easy way to find out is timing it. Consider this timing code:

for i in range(1, 5):
    a = list(range(10**i))
    %timeit max([x for x in a if x != max(a)])
17.6 µs ± 178 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
698 µs ± 14.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
61.6 ms ± 340 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
6.31 s ± 167 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Each time it multiplied the number of elements by 10 the runtime increased by 100. That's almost certainly O(n**2). For an O(n) algorithm the runtime would increase linearly with the number of elements:

for i in range(1, 6):
    a = list(range(10**i))
    max_ = max(a)
    %timeit max([x for x in a if x != max_])
4.82 µs ± 27.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
29 µs ± 161 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
262 µs ± 3.89 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
2.42 ms ± 13 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
24.9 ms ± 231 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

But I'm not certain the algorithm really does what is asked. Consider the list a=[1,3,3] even the heapq module tells me that the second largest element is 3 (not 1 - what your algorithm returns):

import heapq

>>> heapq.nlargest(2, [1,3,3])[0]
3