theideasmith theideasmith - 2 days ago 5
Python Question

How to interpret the performance of three python implementations of string reversal

I've been programming for about two years at a relatively informal level, that is not needing to care about performance or runtime of my implementations. Now I want to improve my computational rigor by actually computationally analyzing the runtime of my algorithms as opposed to just analytically.

Plotting the runtimes of various implementations of a string reversal algorithm yields results against my expectations.

I implemented four ways of reversing a string in python:


  1. Native
    string[::-1]

  2. Logarithmic reversal with concatenation (working on a similar principle as mergesort)

  3. Logarithmic reversal without concatenation

  4. Using loops



Now my expected order of performance in order of runtime
native < log-pure < log-concat < loop


Here's my graph of performance (note algorithm (1) is too fast to be seen on this scale)

enter image description here 3

What you see here is that the order of performance (on a logarithmic scale) is
native < loop < log-concat < log-pure
which is inconsistent with the expectation of:


  • Loop:
    O(n)

  • Log-concat:
    O(nlogn)

  • Log-pure:
    O(logn)

  • Native: ?



I really don't understand why I'm getting the results I'm getting.

Using native functions

def native_reverse(string):
return string[::-1]


Concatenative Logarithmic Reversal

def naive_fastreverse(string):
strlen = len(string)

# The recursion base case
if strlen==1:
return string

mid = ""
a = strlen/2
b = a

if strlen%2==1: #i.e. strlen is odd
mid = string[strlen/2]
b+=1
# else:
# use no mid letter
# use default b

half_a = string[0:a]
half_b = string[b:strlen]

return naive_fastreverse(half_b) + mid + naive_fastreverse(half_a)


Non-Concatenative Logarithmic Reversal

def optimized_fastreverse(string):
final = list(string)
strlen = len(string)

def computenode(start, end):
if end-start==1:
final[strlen-end] = string[end-1]
else:
computenode(start,start+(end-start)/2)
computenode(start+(end-start)/2, end)

computenode(0, strlen)
return ''.join(final)


Loop Based Reversal

def loopy_reverse(string):
final=list(string)
strlen=len(string)
for i in xrange(strlen):
final[strlen-i-1] = string[i]
final[i] = string[strlen-i-1]
return ''.join(final)


Here's my drawing code:

import time
import math
import random
import string
import numpy
Num = 5
def rand_str_gen(N):
return ''.join(
random.choice(
string.ascii_uppercase + string.digits)
for _ in range(N))

randstr = "a"*(10**Num) #rand_str_gen(10**Num)

def run(reverser, reversable):
start = time.time()
reverser(reversable)
end = time.time()
return end-start

def profile(reverser):
xs = np.arange(1,Num+1,0.1)
ys = []
for i in xs:
ys.append(run(reverser,randstr[1:int(10.**i)] ))
return xs ,ys

functions = [
native_reverse,
naive_fastreverse,
optimized_fastreverse,
loopy_reverse]

reversers = {
# Makes the function names look nice
(" ".join(
map(lambda s: s.capitalize(),
func.__name__.split("_")))): profile(func)
for func in functions}

%matplotlib inline
lengths = ["$10^%d$"%i for i in range(Num+1)]

from pylab import *
fig = figure(figsize=(20,10))
ax = fig.add_subplot(1,1,1)
plots = [
ax.plot(*data, label=name)[0]
for (data, name) in zip(reversers.values(), reversers.keys())
]

legend(handles=plots,loc=2)
ax.set_title("Algorithm Runtime (ms) by String Length")
ax.set_xticklabels(lengths)

Answer

You are not measuring the performance ofthe algorithm, but the behavior of the memory. Due to the multiple cache levels in a PC, there are difference regimes with very different access times. This makes practice completly diverge from theory.

And you are also timing the Python intepreter at the same time, which also distorts the results in an unknown way.

Comments