Mango Mango - 6 months ago 11
Python Question

Would it be better to use "if x in (y, z)" over "if x == y or x == z"?

Given this simple condition:

if x == y or x == z:
print("Hello World!");


I understand that Python would first look to see if
x
is equal to
y
and if
x
is not equal to
y
it then it would check to see if
x
is equal to
z
, printing
Hello World!
if at least one of the conditions is
True
.

If I were to do this instead:

if x in (y, z):
print("Hello World!");


To my understanding Python would iterate through the "yz" tuple and then print
Hello World!
if the value of
x
is in the "yz" tuple.

Which method would be faster / more efficient to use?

Would Python not bother to check if
x
was equal to
z
if
x
was equal to
y
?

Would Python still execute the code in the if statement if
x
was equal to
y
but not
z
?

Thank you in advance.

Answer

Let's test it out ourselves.

Here is a class that overloads the equality operator to let us see what Python is doing:

class Foo:
  def __init__(self, name):
    self.name = name

  def __eq__(self, other):
    print self.name, "==", other.name, "?"
    return self.name == other.name

Let's test out short circuiting:

# x and a are passed the same string because we want x == a to be True
x = Foo("a")
a, b = Foo("a"), Foo("b")
if x in (a, b):
  print "Hello World!"

For me, this outputs:

a == a ?
Hello World!

So short-circuiting does work as desired, and the block is executed as desired.

Now for speed. If we modify the above __eq__ method to remove the print statement (to avoid I/O in our benchmark) and use IPython's %timeit magic command, we can test it this way:

c = Foo("c") # for comparison when x is not equal to either case
%timeit x in (a, b)
%timeit (x == a or x == b)
%timeit x in (b, a) # non-short-circuiting
%timeit (x == b or x == a)
%timeit x in (b, c) # not equal to either case
%timeit (x == b or x == c)

This yields:

1000000 loops, best of 3: 437 ns per loop
1000000 loops, best of 3: 397 ns per loop
1000000 loops, best of 3: 796 ns per loop
1000000 loops, best of 3: 819 ns per loop
1000000 loops, best of 3: 779 ns per loop
1000000 loops, best of 3: 787 ns per loop

So, pretty comparable results. Running multiple times, I consistently get slightly faster results in the short circuiting test (testing a before b) with the x == a or x == b method. However, it isn't a big enough difference to worry about. Just use whichever is most readable in a case-by-case basis :)