Xaser Xaser - 1 year ago 77
Python Question

Python bug? equivalent functions - variable overflow in one but not in the other

I have the following two (supposedly equivalent) functions, to see which one executes faster (will be used to process a large data set)

import numpy as np

def interval_energy(array, start_intensity, intensity_window_length):
bins = np.bincount(array.ravel())
energy = 0
for i in range(start_intensity, min(start_intensity + intensity_window_length, len(bins))):
energy += bins[i] * (i ** 2)

print("Energy: {}".format(energy))
return energy

def interval_energy2(array, start_intensity, intensity_window_length):
flat_array = array.ravel()
energy = 0
for i in range(0, array.size):
if start_intensity <= flat_array[i] < (start_intensity + intensity_window_length):
energy += flat_array[i] ** 2

print("Energy2: {}".format(energy))
return energy

i'm using the following code to test the code:

if __name__ == '__main__':
import timeit
setup = """
from interval_energy import interval_energy, interval_energy2
import numpy as np
a = np.random.randint(0, 3000, (150, 150, 150))

t = timeit.Timer('interval_energy(a, 50, 2450)', setup)
t2 = timeit.Timer('interval_energy2(a, 50, 2450)', setup)
t3 = timeit.Timer("""
interval_energy(a, 50, 2450)
interval_energy2(a, 50, 2450)
""", setup)


in interval_energy2 however, the energy variable overflows with this error being raised:

RuntimeWarning: overflow encountered in long_scalars

Update 1: I have noticed that in the first version, energy is of type int when its created and int64 when its returned, whereas in the second version of the function it is also of type int when its created however stays int32 until the point where it is returned. thus the overflow. Why does Python automatically convert the variable in one case but not in the other

Update 2: its been established that the two functions in theory produce the same result.

Update 3: I'm using Python3.5.2 64bit. I have read that Python3 ONLY uses long, so what I see here (32bit integer overflow) should not even be possible? possible because of c-stack of pandas /numpy

Update 4: Possible bug with CPython for windows, as the identical code works fine on OSX / unix (same python, numpy versions used on both systems)

Answer Source

Found it. This is a good question:

print type(flat_array[3])
<type 'numpy.int32'>

but, after the bincount:

print type(bins[3])
<type 'numpy.int64'>

apparently the binning converted the data type, without you noticing! This is why the fix by f5r5e5d worked. So you should have got an error on both, but the first got spared. Change your array definition:

a = np.random.randint(0, 3000, (150, 150, 150),dtype=np.int64)

as f5r5e5d suggested. I get no error and close, but not identical results - that's up to you.

EDIT Currently it seems like on versions after 2.7.9, where dtype is an allowed keyword of array, the default dtype is according to the values given to the array. Using energy=np.int64() will make sure the variable we expect to overflow is a large int.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download