Hound Hound - 6 months ago 15
Python Question

Are values in one dataframe in bins of another dataframe?

I have a dataframe named

loc_df
with two columns of bins that looks like this...

> loc_df

loc_x_bin loc_y_bin
(-20, -10] (0, 50]
(-140, -130] (100, 150]
(0, 10] (-50, 0]


I have another dataframe called data that looks like this...

> data

loc_x loc_y
-15 25
30 35
5 -45
-135 -200


I want to make a new boolean column in data that shows whether
loc_x
is within the values of
loc_x_bin
and
loc_y
is within
loc_y_bin
of the dataframe
loc_df
. The resulting dataframe would look like this:

> data

loc_x loc_y in_bins
-15 25 true
30 35 false
5 -45 true
-135 -200 false

Answer

UPDATE:

if df_loc.dtypes doesn't show category for both columns, then you may want to convert your categories to category dtype first:

df_loc.loc_x_bin = df_loc.loc_x_bin.astype('category')
df_loc.loc_y_bin = df_loc.loc_y_bin.astype('category')

then you can categorize your columns in the df "on the fly":

xstep = 10
ystep = 50

df['in_bins'] = (   (pd.cut(df.loc_x, np.arange(-500, 500, xstep)).isin(df_loc.loc_x_bin))
                    &
                    (pd.cut(df.loc_y, np.arange(-500, 500, ystep)).isin(df_loc.loc_y_bin))
                )

Test:

In [130]: df['in_bins'] = (   (pd.cut(df.loc_x, np.arange(-500, 500, xstep)).isin(df_loc.loc_x_bin))
   .....:                     &
   .....:                     (pd.cut(df.loc_y, np.arange(-500, 500, ystep)).isin(df_loc.loc_y_bin))
   .....:                 )

In [131]: df
Out[131]:
   loc_x  loc_y in_bins
0    -15     25    True
1     30     35   False
2      5    -45    True
3   -135   -200   False

original answer:

you can do it this way:

df['x_cat'] = pd.cut(df.loc_x, np.arange(-500, 500, 10))
df['y_cat'] = pd.cut(df.loc_y, np.arange(-500, 500, 50))

Test:

In [118]: df
Out[118]:
   loc_x  loc_y         x_cat         y_cat
0    -15     25    (-20, -10]       (0, 50]
1     30     35      (20, 30]       (0, 50]
2      5    -45       (0, 10]      (-50, 0]
3   -135   -200  (-140, -130]  (-250, -200]

In [119]: (df.x_cat.isin(df_loc.loc_x_bin)) & (df.y_cat.isin(df_loc.loc_y_bin))
Out[119]:
0     True
1    False
2     True
3    False
dtype: bool

In [120]: df['in_bins'] = (df.x_cat.isin(df_loc.loc_x_bin)) & (df.y_cat.isin(df_loc.loc_y_bin))

In [121]: df
Out[121]:
   loc_x  loc_y         x_cat         y_cat in_bins
0    -15     25    (-20, -10]       (0, 50]    True
1     30     35      (20, 30]       (0, 50]   False
2      5    -45       (0, 10]      (-50, 0]    True
3   -135   -200  (-140, -130]  (-250, -200]   False
Comments