Hound Hound - 6 months ago 64
Python Question

Pandas .cut and .isin functionality

I have two dataframes. First:

s = pd.Series( ["(-20, -10]", "(-140, -130]", "(0, 10]"], dtype = "category")
t = pd.Series( ["(0, 50]", "(100, 150]", "(-50, 0]"], dtype = "category")
df_loc = pd.DataFrame({'loc_x_bin': s, 'loc_y_bin': t })
df_loc

[out]:
loc_x_bin loc_y_bin
(-20, -10] (0, 50]
(-140, -130] (100, 150]
(0, 10] (-50, 0]


Second:

a = pd.Series( [-15, 30, 5, -135, 5, -15])
b = pd.Series( [25, 35, -45, -200, 25, 25])
data = pd.DataFrame({'loc_x': a, 'loc_y': b})
data

[out]:
loc_x loc_y
-15 25
30 35
5 -45
-135 -200
5 25
-15 25


I am trying to figure out if
loc_x
and
loc_y
are in
loc_x_bin
and
loc_y_bin
of the same row. See this post for more details. Are values in one dataframe in bins of another dataframe?. However, what I am trying to figure out now is why the 3rd and 5th row of the output below is 'False'.

[in]: xstep = 10
[in]: pd.cut(data.loc_x, np.arange(-500, 500, xstep)).isin(df_loc.loc_x_bin))
[out]:
0 True
1 False
2 False*
3 True
4 False*
5 True


When I run the code below it seems to me that (0, 10] 'is in'
df_loc.loc_x_bin
because it includes a
(0,10]
bin. So why am I getting
False
in the 3rd and 5th row above?

[in]:
print pd.cut(data.loc_x, np.arange(-500, 500, xstep))
print df_loc.loc_x_bin

[out]:
0 (-20, -10]
1 (20, 30]
2 (0, 10]*
3 (-140, -130]
4 (0, 10]*
5 (-20, -10]

0 (-20, -10]
1 (-140, -130]
2 (0, 10]*

Answer

Note the extra space in (0, 10] in the code above:

s = pd.Series( ["(-20, -10]", "(-140, -130]", "(0,  10]"], dtype = "category")

It should be:

s = pd.Series( ["(-20, -10]", "(-140, -130]", "(0, 10]"], dtype = "category")
Comments