Fabio Lamanna Fabio Lamanna - 22 days ago 6
Python Question

numpy boolean comparison between big arrays returns False instead of boolean array

I've just encountered the following issues. Starting from two arrays, and performing a boolean comparison like:

In [47]: a1 = np.random.randint(0,10,size=1000000)

In [48]: a2 = np.random.randint(0,10,size=1000000)

In [52]: a1[:,None] == a2
Out[52]: False


returns a Boolean value instead of an array of booleans, whereas:

In [62]: a1 = np.random.randint(0,10,size=10000)

In [63]: a2 = np.random.randint(0,10,size=10000)

In [64]: a1[:,None] == a2
Out[64]:
array([[False, False, False, ..., False, False, False],
[False, False, False, ..., False, False, False],
[False, False, False, ..., False, False, False],
...,
[False, False, False, ..., False, False, False],
[ True, False, False, ..., False, False, False],
[False, False, False, ..., True, False, False]], dtype=bool)


works as expected. Is this an issue related to the sizes of the arrays? Performing a simple comparison on the single dimension of the array works, no matter the size.

In [65]: a1 = np.random.randint(0,10,size=1000000)

In [66]: a2 = np.random.randint(0,10,size=1000000)

In [67]: a1 == a2
Out[67]: array([False, False, False, ..., False, False, True], dtype=bool)


Anyone is able to reproduce the problem? I'm on Numpy 1.9.2 and Python 2.7.3.

EDIT: just update to Numpy 1.11 but the issue persists.

Answer

When I try the comparison, I get a warning:

[...]/__main__.py:1: DeprecationWarning: elementwise == comparison failed;
this will raise an error in the future.
    if __name__ == '__main__':

This warning is triggered in NumPy's code here:

if (result == NULL) {
    /*
     * Comparisons should raise errors when element-wise comparison
     * is not possible.
     */
    /* 2015-05-14, 1.10 */
    PyErr_Clear();
    if (DEPRECATE("elementwise == comparison failed; "
                  "this will raise an error in the future.") < 0) {
        return NULL;
    }

This branch is reached because result == NULL, where result is what happened when NumPy tried to do the requested operation (the elementwise equality check, involving broadcasting two arrays).

Why did this operation fail and return NULL? Very possibly because NumPy needed to allocate a huge chunk of memory for the array; enough to hold 1012 booleans. This is about 931 GB: it couldn't do this and returned NULL instead.

Comments