mnky9800n mnky9800n - 3 months ago 22
Python Question

numpy array converted to pandas dataframe drops values

I need to calculate statistics for each node of a 2D grid. I figured the easy way to do this was to take the cross join (AKA cartesian product) of two ranges. I implemented this using

numpy
as this function:

def node_grid(x_range, y_range, x_increment, y_increment):
x_min = float(x_range[0])
x_max = float(x_range[1])
x_num = (x_max - x_min)/x_increment + 1
y_min = float(y_range[0])
y_max = float(y_range[1])
y_num = (y_max - y_min)/y_increment + 1

x = np.linspace(x_min, x_max, x_num)
y = np.linspace(y_min, y_max, y_num)

ng = list(product(x, y))
ng = np.array(ng)
return ng, x, y


However when I convert this to a
pandas
dataframe it drops values. For example:

In [2]: ng = node_grid(x_range=(-60, 120), y_range=(0, 40), x_increment=0.1, y_increment=0.1)
In [3]: ng[0][(ng[0][:,0] > -31) & (ng[0][:,0] < -30) & (ng[0][:,1]==10)]
Out[3]: array([[-30.9, 10. ],
[-30.8, 10. ],
[-30.7, 10. ],
[-30.6, 10. ],
[-30.5, 10. ],
[-30.4, 10. ],
[-30.3, 10. ],
[-30.2, 10. ],
[-30.1, 10. ]])

In [4]: node_df = pd.DataFrame(ng[0])
node_df.columns = ['xx','depth']
print(node_df[(node_df.depth==10) & node_df.xx.between(-30,-31)])
Out[4]:Empty DataFrame
Columns: [xx, depth]
Index: []


The dataframe isn't empty:

In [5]: print(node_df.head())
Out[5]: xx depth
0 -60.0 0.0
1 -60.0 0.1
2 -60.0 0.2
3 -60.0 0.3
4 -60.0 0.4


values from the numpy array are being dropped when they are being put into the pandas array. Why?

Answer

the "between" function demands that the first argument be less than the latter.

In: print(node_df[(node_df.depth==10) & node_df.xx.between(-31,-30)]) xx depth 116390 -31.0 10.0 116791 -30.9 10.0 117192 -30.8 10.0 117593 -30.7 10.0 117994 -30.6 10.0 118395 -30.5 10.0 118796 -30.4 10.0 119197 -30.3 10.0 119598 -30.2 10.0 119999 -30.1 10.0 120400 -30.0 10.0

For clarity the product() function used comes from the itertools package, i.e., from itertools import product

Comments