Stefan Pochmann Stefan Pochmann - 2 months ago 11
Python Question

Hidden references to function arguments causing big memory usage?

Edit: Never mind, I was just being completely stupid.

I came across code with recursion on smaller and smaller substrings, here's its essence plus my testing stuff:

def f(s):
if len(s) == 2**20:
input('check your memory usage again')
else:
f(s[1:])
input('check your memory usage, then press enter')
f('a' * (2**20 + 500))


Before the call, my Python process takes about 9 MB (as checked by Windows task manager). After the 500 levels with ~1MB strings, it's at about 513 MB. No surprise, as each call level is still holding on to its string in its
s
variable.

But I tried to fix it by replacing the reference to the string with a reference to the new string and it still goes up to 513 MB:

def f(s):
if len(s) == 2**20:
input('check your memory usage again')
else:
s = s[1:]
f(s)
input('check your memory usage, then press enter')
f('a' * (2**20 + 500))


Why doesn't that let go off the memory? The strings even only get smaller, so later strings would easily fit into the space of earlier strings. Are there hidden additional references to the strings somewhere or what is going on?

I had expected it to behave like this, which only goes up to 10 MB (a change of 1 MB, as expected because the new string is built while the old string still exists):

input('check your memory usage, then press enter')
s = 'a' * (2**20 + 500)
while len(s) != 2**20:
s = s[1:]
input('check your memory usage again')


(Never mind the poor time complexity, btw, I know that, don't bother.)

Answer

Your function is recursive, so when you call f(), your current frame is put onto a stack, and a new one is created. So basically each function call keeps a reference to the new string it creates to pass down to the next call.

To illustrate the stack

import traceback

def recursive(x):
    if x:
        recursive(x[1:])
    else:
        traceback.print_stack()

recursive('abc')

Gives

$ python tmp.py
  File "tmp.py", line 10, in <module>
    recursive('abc')
  File "tmp.py", line 5, in recursive
    recursive(x[1:])
  File "tmp.py", line 5, in recursive
    recursive(x[1:])
  File "tmp.py", line 5, in recursive
    recursive(x[1:])
  File "tmp.py", line 7, in recursive
    traceback.print_stack()

When the final call to recursive() returns, it returns back into the next call above it which still has the reference to x.

But I tried to fix it by replacing the reference to the string with a reference to the new string and it still goes up to 513 MB

Well you did in the current function being called, but the function which called it still has the reference to what was passed in. e.g.

def foo(x):
    print "foo1", locals()
    bar(x)
    print "foo2", locals()

def bar(x):
    print "bar1", locals()
    x = "something else"
    print "bar2", locals()

foo('original thing')

When foo() is called, it passes the string 'original thing' to bar(). And even though bar() then gets rid of the reference, the current call above to foo() still has the reference

$ python tmp_x.py 
foo1 {'x': 'original thing'}
bar1 {'x': 'original thing'}
bar2 {'x': 'something else'}
foo2 {'x': 'original thing'}

I hope that illustrates it. I have been a little vague in my first statement about stack frames.