Fabio Lamanna - 1 year ago 155
Python Question

# numpy boolean comparison between big arrays returns False instead of boolean array

I've just encountered the following issues. Starting from two arrays, and performing a boolean comparison like:

``````In [47]: a1 = np.random.randint(0,10,size=1000000)

In [48]: a2 = np.random.randint(0,10,size=1000000)

In [52]: a1[:,None] == a2
Out[52]: False
``````

returns a Boolean value instead of an array of booleans, whereas:

``````In [62]: a1 = np.random.randint(0,10,size=10000)

In [63]: a2 = np.random.randint(0,10,size=10000)

In [64]: a1[:,None] == a2
Out[64]:
array([[False, False, False, ..., False, False, False],
[False, False, False, ..., False, False, False],
[False, False, False, ..., False, False, False],
...,
[False, False, False, ..., False, False, False],
[ True, False, False, ..., False, False, False],
[False, False, False, ...,  True, False, False]], dtype=bool)
``````

works as expected. Is this an issue related to the sizes of the arrays? Performing a simple comparison on the single dimension of the array works, no matter the size.

``````In [65]: a1 = np.random.randint(0,10,size=1000000)

In [66]: a2 = np.random.randint(0,10,size=1000000)

In [67]: a1 == a2
Out[67]: array([False, False, False, ..., False, False,  True], dtype=bool)
``````

Anyone is able to reproduce the problem? I'm on Numpy 1.9.2 and Python 2.7.3.

EDIT: just update to Numpy 1.11 but the issue persists.

When I try the comparison, I get a warning:

``````[...]/__main__.py:1: DeprecationWarning: elementwise == comparison failed;
this will raise an error in the future.
if __name__ == '__main__':
``````

This warning is triggered in NumPy's code here:

``````if (result == NULL) {
/*
* Comparisons should raise errors when element-wise comparison
* is not possible.
*/
/* 2015-05-14, 1.10 */
PyErr_Clear();
if (DEPRECATE("elementwise == comparison failed; "
"this will raise an error in the future.") < 0) {
return NULL;
}
``````

This branch is reached because `result == NULL`, where `result` is what happened when NumPy tried to do the requested operation (the elementwise equality check, involving broadcasting two arrays).

Why did this operation fail and return `NULL`? Very possibly because NumPy needed to allocate a huge chunk of memory for the array; enough to hold 1012 booleans. This is about 931 GB: it couldn't do this and returned `NULL` instead.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download