OrangeFlash81 OrangeFlash81 - 2 months ago 13
Python Question

Why are literal formatted strings so slow in Python 3.6 alpha?

I've downloaded a Python 3.6 alpha build from the Python Github repository, and one of my favourite new features is literal string formatting. It can be used like so:

>>> x = 2
>>> f"x is {x}"
"x is 2"


This appears to do the same thing as using the
format
function on a
str
instance. However, one thing that I've noticed is that this literal string formatting is actually very slow compared to just calling
format
. Here's what
timeit
says about each method:

>>> x = 2
>>> timeit.timeit(lambda: f"X is {x}")
0.8658502227130764
>>> timeit.timeit(lambda: "X is {}".format(x))
0.5500578542015617


If I use a string as
timeit
's argument, my results are still showing the pattern:

>>> timeit.timeit('x = 2; f"X is {x}"')
0.5786435347381484
>>> timeit.timeit('x = 2; "X is {}".format(x)')
0.4145195760771685


As you can see, using
format
takes almost half the time. I would expect the literal method to be faster because less syntax is involved. What is going on behind the scenes which causes the literal method to be so much slower?

Answer

The f"..." syntax is effectively converted to a str.join() operation on the literal string parts around the {...} expressions, and the results of the expressions themselves passed through the object.__format__() method (passing any :.. format specification in). You can see this when disassembling:

>>> import dis
>>> dis.dis(compile('f"X is {x}"', '', 'exec'))
  1           0 LOAD_CONST               0 ('')
              3 LOAD_ATTR                0 (join)
              6 LOAD_CONST               1 ('X is ')
              9 LOAD_NAME                1 (x)
             12 FORMAT_VALUE             0
             15 BUILD_LIST               2
             18 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
             21 POP_TOP
             22 LOAD_CONST               2 (None)
             25 RETURN_VALUE
>>> dis.dis(compile('"X is {}".format(x)', '', 'exec'))
  1           0 LOAD_CONST               0 ('X is {}')
              3 LOAD_ATTR                0 (format)
              6 LOAD_NAME                1 (x)
              9 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
             12 POP_TOP
             13 LOAD_CONST               1 (None)
             16 RETURN_VALUE

Note the BUILD_LIST and LOAD_ATTR .. (join) op-codes in that result. The new FORMAT_VALUE takes the top of the stack plus a format value (parsed out at compile time) to combine these in a object.__format__() call.

So your example, f"X is {x}", is translated to:

''.join(["X is ", x.__format__('')])

Note that this requires Python to create a list object, and call the str.join() method.

The str.format() call is also a method call, and after parsing there is still a call to x.__format__('') involved, but crucially, there is no list creation involved here. It is this difference that makes the str.format() method faster.

Note that Python 3.6 has only been released as an alpha build; this implementation can still easily change. See PEP 494 – Python 3.6 Release Schedule for the time table, as well as Python issue #27078 (opened in response to this question) for a discussion on how to further improve the performance of formatted string literals.

Comments