vishes_shell vishes_shell - 15 days ago 6
Python Question

'{0}'.format() is faster than str() and '{}'.format() using IPython %timeit and otherwise using pure Python

So it's CPython thing, not quite sure that it has same behaviour with other implementations.

But

'{0}'.format()
is faster than
str()
and
'{}'.format()
. Results from Python 3.5.2, but i tried it with Python 2.7.12 and trend is the same.

%timeit q=['{0}'.format(i) for i in range(100, 100000, 100)]
%timeit q=[str(i) for i in range(100, 100000, 100)]
%timeit q=['{}'.format(i) for i in range(100, 100000, 100)]

1000 loops, best of 3: 231 µs per loop
1000 loops, best of 3: 298 µs per loop
1000 loops, best of 3: 434 µs per loop


From docs of
object.__str__(self)



Called by
str(object)
and the built-in functions
format()
and
print()
to compute the “informal” or nicely printable string representation of an object.


So,
str()
and
format()
call same
object.__str__(self)
method, but where that difference in speed come from?

UPDATE
as @StefanPochmann and @Leon noted in comments, they having different results and i tried to run it with
python -m timeit "..."
and they are right, because results are:

$ python3 -m timeit "['{0}'.format(i) for i in range(100, 100000, 100)]"
1000 loops, best of 3: 441 usec per loop

$ python3 -m timeit "[str(i) for i in range(100, 100000, 100)]"
1000 loops, best of 3: 297 usec per loop

$ python3 -m timeit "['{}'.format(i) for i in range(100, 100000, 100)]"
1000 loops, best of 3: 420 usec per loop


So it seems that something strange IPython is making...

NEW QUESTION: What is preferred way to converting object to
str
object by speed?

Answer

The IPython version is probably off for some reason (though, when tested with a longer format string in different cells, it behaved slightly better). "{}" is a bit faster than "{num}" which is faster than "{name}" while they're all slower than str.

str(val) is the fastest way to transform an object to a string; it directly gets the objects __str__, if one exists, and returns it. Others, like format, include additional overhead due to an extra function call (to format itself); handling any arguments, parsing the format string and then invoking the __str__ of their args.

For the str.format methods "{}" uses automatic numbering; from a small section in the docs on the format syntax:

Changed in version 3.1: The positional argument specifiers can be omitted, so '{} {}' is equivalent to '{0} {1}'.

that is, if you supply a string of the form:

"{}{}{}".format(1, 2, 3)

CPython immediately knows that this is equivalent to:

"{0}{1}{2}".format(1, 2, 3)

With a format string that contains numbers; CPython can't assume a strictly increasing number (that starts from 0) and must parse every single bracket in order to get it right, slowing things down a bit in the process:

"{1}{2}{0}".format(1, 2, 3)

That's why it also is not allowed to mix these two together:

"{1}{}{2}".format(1, 2, 3)

you'll get a nice ValueError back:

ValueError: cannot switch from automatic field numbering to manual field specification

it also grabs these positionals with PySequence_GetItem which is fast (in comparison to PyObject_GetItem [see next]).

For "{name}" values, CPython always has extra work to do due to the fact that we're dealing with keyword arguments rather than positional ones; this includes things like building the dictionary for the calls and generating way more LOAD bytecode instructions for loading keys and values. The keyword form of function calling always introduces some overhead. In addition, it seems that the grabbing actually uses PyObject_GetItem which incurs some extra overhead due to its generic nature.

Comments