Rodney Wells Rodney Wells - 7 months ago 39
Python Question

Why is ''.join() faster than += in python?

I'm able to find a bevy information online (in stackoverflow and otherwise) about how it's a very inefficient and bad practice to use

for concatenation in Python.

I can't seem to find WHY
is so inefficient. Outside of a mention here that "it's been optimized for 20% improvement in certain cases" (still not clear what those cases are), I can't find any additional information.

What is happening on a more technical level that makes
superior to other python concatenation methods? Thanks for your insights, in advance.


Lets say you have 3 strings:


When you use +=, python first needs to allocate and create:


before it can finally allocate and create:


So for each += that gets called, the entire contents of the string and whatever is getting added to it need to be copied into an entirely new memory buffer. In other words, if you have N strings to be joined, you need to allocate approximately N temporary strings and the first substring gets copied ~N times. The last substring only gets copied once, but on average, each substring gets copied ~N/2 times.

With .join, python can figure out how much memory it needs up-front and then allocate a correctly sized buffer. Finally, it then copies each piece into the new buffer which means that each piece is only copied once.