rugrln rugrln - 3 months ago 16
Python Question

Numpy Matrix inside Python class exhibiting linked behaviour?

If you make a class as such in Python:

import numpy as np
class Foo:

def __init__(self, data):
self.data = data
self.data_copy = self.copy(self.data)

def copy(self, data):
a = []
for e in data:
a.append(e)
return a

def change(self, val):
for i in range(0, len(self.data_copy)):
self.data_copy[i] += val


And then create an instance of the class such as:

a = Foo([np.matrix([1,2,3]), np.matrix([5,6,7])])


Now if we call the
a.change(np.matrix([5,5,2]))
function, which should only modify the
self.data_copy
list, the
self.data
list also updates with the changes. It appears, even after making a new list, the Numpy matrices in the two lists remain linked.

This is a nice feature in some respects, but does not work had I passed in an ordinary Python list of numbers. Is this a bug, or just a side-effect of how Numpy matrices are copied? And if so, what's the best way to replicate the behaviour with ordinary Python lists?

Answer

When you make your "copy", you're just making a new list that contains the same objects as the old list. That is, when you iterate through data you're iterating through references to the objects in it, and when you append e you're appending a reference rather than a new or copied object. Thus any changes to those objects will be visible in any other list that references them. It seems like what you want is copies of the actual matrices. To do this, in your copy method, instead of appending e append something like numpy.array(e, copy=True). This will create true copies of the matrices and not just new references to the old ones.

More generally, Python objects are effectively always passed by reference. This doesn't matter for immutable objects (strings, integers, tuples, etc), but for lists, dictionaries, or user defined classes that can mutate, you will need to make explicit copies. Often the built in copy module, or simply constructing a new object directly from the old, is what you want to do.

Edit: I now see what you mean. I had slightly misunderstood your original question. You're referring to += mutating the matrix objects rather than truly being = self + other. This is simply a fact of how += works for most Python collection types. += is in fact a separate method, distinct from assigning the result of adding. You will still see this behavior with normal Python lists.

a = [1, 2, 3]
b = a
b += [4]
>>> print(a)
[1, 2, 3, 4]

You can see that the += is mutating the original object rather than creating a new one and setting b to reference it. However if you do:

b = b + [4]
>>> print(a)
[1, 2, 3]
>>> print(b)
[1, 2, 3, 4]

This will have the desired behavior. The + operator for collections (lists, numpy arrays) does indeed return a new object, however += usually just mutates the old one.

Comments