Rolf of Saxony Rolf of Saxony - 2 months ago 6
Python Question

Why does "".join() appear to be slower than +=

Despite this question Why is ''.join() faster than += in Python? and it's answers and this great explanation of the code behind the curtain: https://paolobernardi.wordpress.com/2012/11/06/python-string-concatenation-vs-list-join/

My tests suggest otherwise and I am baffled.

Am I doing something simple, incorrectly? I'll admit that I'm fudging the creation of x a bit but I don't see how that would affect the outcome.

import time
x="xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
y=""
t1 = (time.time())
for i in range(10000):
y+=x
t2 = (time.time())
#print (y)
print (t1,t2,"=",t2-t1)


(1473524757.681939, 1473524757.68521, '=', 0.0032711029052734375)

import time
x="xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
y=""
t1 = (time.time())
for i in range(10000):
y=y+x
t2 = (time.time())
#print (y)
print (t1,t2,"=",t2-t1)


(1473524814.544177, 1473524814.547544, '=', 0.0033669471740722656)

import time
x=10000*"xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
y=""
t1 = (time.time())
y= "".join(x)
t2 = (time.time())
#print (y)
print (t1,t2,"=",t2-t1)


(1473524861.949515, 1473524861.978755, '=', 0.029239892959594727)

As can be seen the
"".join()
is much slower and yet we're told that it's meant to be quicker.

These values are very similar in both python2.7 and python3.4

Edit:
Ok fair enough, I delete this once it's reached the maximum number of down votes.
The "one huge string" thing is the kicker.

import time
x=[]
for i in range(10000):
x.append("xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx")
y=""
t1 = (time.time())
y= "".join(x)
t2 = (time.time())
#print (y)
print (t1,t2,"=",t2-t1)


(1473526344.55748, 1473526344.558409, '=', 0.0009288787841796875)

An order of magnitude quicker.
Mia Culpa!

Answer

You called ''.join() on one huge string, not a list (multiplying a string produces a larger string). This forces str.join() to iterate over that huge string, joining 74k individual 'x' characters. In other words, your second test does 74 times more work than your first.

To conduct a fair trial, you need to start with the same inputs for both, and use the timeit module to reduce the influence of garbage collection and other processes on your system.

That means both approaches need to work from a list of strings (your assignment examples rely on repeatedly adding a string literal, stored as a constant):

from timeit import timeit

testlist = ['x' * 74 for _ in range(100)]

def strjoin(testlist):
    return ''.join(testlist)

def inplace(testlist):
    result = ''
    for element in testlist:
        result += element
    return result

def concat(testlist):
    result = ''
    for element in testlist:
        result = result + element
    return result

for f in (strjoin, inplace, concat):
    timing = timeit('f(testlist)', 'from __main__ import f, testlist',
                    number=100000)
    print('{:>7}: {}'.format(f.__name__, timing))

On my Macbook Pro, on Python 3.5, this produces:

strjoin: 0.09923043003072962
inplace: 1.0032496969797648
 concat: 1.0027298880158924

On 2.7, I get:

strjoin: 0.118290185928
inplace: 0.85814499855
 concat: 0.867822885513

str.join() is still the winner here.