Whud Whud - 1 month ago 7
Python Question

What is the most efficient way to compare every value of 2 numpy matrices?

I'd like to more efficiently take every value of 2 matrices(

a
and
b
) of the same size and return a third boolean(or 1/ 0 matrix to make things clean) into matrix
c
containing the results of the conditions.

Example:

Condition:
For a == 0 and b == 3


a = [[1 0]
[0 1]]

b = [[3 5]
[3 9]]


Would return:

c = [[0 0]
[1 0]]


[0,1]
is the only place where
a == 0
and
b == 3
so it is the only place
True
in c

This is the code I have so far:

import numpy as np

a = np.matrix("1, 0; 0 1")
print(a,'\n')
b = np.matrix("3, 5; 3 9")
print(b,'\n')

c = []
for x in range(0,np.shape(a)[1]):
row = []
for y in range(0,np.shape(a)[1]):
row.append(int(a[x,y] == 0 and b[x,y] == 3)) # the int() is there just to keep things tighty for the 3 prints
c.append(row)
c = np.matrix(c)
print(c)


results:

[[1 0]
[0 1]]

[[3 5]
[3 9]]

[[0 0]
[1 0]]


I could also use:

a=a==0
b=b==3
c=a&b


But that would require making a copy of a and b and with big matrices would that still be efficient ?

Why can't I just use
a == 0 & b == 3
?

I need to do a comparison like this for several matrices that are 1000+ size so you could see where iterating thought them would be quite slow.

Thank you very much for any help I'm sure the answer is something simple and right in front of me but I'm just dumb.

Answer

You can use (pretty) much the expression that you wanted:

>>> (a == 0) & (b == 3)
matrix([[False, False],
        [ True, False]], dtype=bool)

Beware, you need the parenthesis to make the precendence work out as you'd like -- Normally & will bind tighter than ==. If you don't like the extra parenthesis, you can use the more verbose (though arguably more semantically correct) np.logical_and function.

Also note that while no copies are being made, there are temporary arrays being created. Specifically, the result of a == 0 and b == 3 are both going to be allocated and freed in this statement. Generally, that's not such a big deal and numpy's vectorized operations remain fast. However, if that isn't fast enough for you, you can use a library like numexpr to remove the temporary arrays:

>>> numexpr.evaluate('(a == 0) & (b == 3)')
array([[False, False],
       [ True, False]], dtype=bool)

And of course, if you need 1 and 0, you can use result.astype(int) on the output array to make arrays of ints rather than booleans.

Comments