AK47 - 6 months ago 43
Python Question

# Big-O of list comprehension that calls max in the condition: O(n) or O(n^2)?

Q. Write an algorithm that returns the second largest number in an array

``````a = [1, 2, 3, 4, 5]
print(max([x for x in a if x != max(a)]))
>> 4
``````

I'm trying to figure out how this algorithm works and whether or not pythons internal magic will make this as efficient as writing a linear algorithm which just loops over the list
`a`
once and stores the highest and second highest values.

Correct me if I'm wrong:

• The call to
`max(a)`
would be O(n)

• `[x for x in a]`
would also be O(n)

Would python be smart enough to cache the value of
`max(a)`
or would this mean that the list comprehension part the algorithm is O(n^2)?

And then the final
`max([listcomp])`
would be another O(n), but this would only run once after the comprehension is finished, so the final algorithm would be O(n^2)?

Is there any fancy business going on internally that would cache the
`max(a)`
value and result in this algorithm working out quicker than O(n^2)?

The easy way to find out is timing it. Consider this timing code:

``````for i in range(1, 5):
a = list(range(10**i))
%timeit max([x for x in a if x != max(a)])
``````
```17.6 µs ± 178 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
698 µs ± 14.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
61.6 ms ± 340 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
6.31 s ± 167 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)```

Each time it multiplied the number of elements by 10 the runtime increased by 100. That's almost certainly `O(n**2)`. For an `O(n)` algorithm the runtime would increase linearly with the number of elements:

``````for i in range(1, 6):
a = list(range(10**i))
max_ = max(a)
%timeit max([x for x in a if x != max_])
``````
```4.82 µs ± 27.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
29 µs ± 161 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
262 µs ± 3.89 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
2.42 ms ± 13 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
24.9 ms ± 231 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)```

But I'm not certain the algorithm really does what is asked. Consider the list `a=[1,3,3]` even the `heapq` module tells me that the second largest element is `3` (not 1 - what your algorithm returns):

``````import heapq

>>> heapq.nlargest(2, [1,3,3])[0]
3
``````
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download