dleal - 4 months ago 11
Python Question

# compare whether value in one column is between two values in another column python pandas

I have two data frames as follow:

A = pd.DataFrame({"value":[3, 7, 5 ,18,23,27,21,29]})

B = pd.DataFrame({"low":[1, 6, 11 ,16,21,26], "high":[5,10,15,20,25,30], "name":["one","two","three","four","five", "six"]})

I want to find whether "value" in A is between 'high' and low' in B, and if so, I want to copy the column name from B to A.

The output should look like this:

A = pd.DataFrame({"value":[3, 7, 5 ,18,23,27,21,29], "name":["one","two","one","four","five", "six", "five", "six"]})

My function uses iterrows as follows:

def func1(row):
x = row['value']
for index,value in B.iterrows():
if ((value['low'] <= x) &(x<=value['high'])):
return value['name']

But it doesn't yet achieve what i want to do,

thank you,

You can use a list comprehension to iterate through the values in A, and then use loc to get the relevant mapped values. le is less than or equal to, and ge is greater than or equal to.

For example, v = 3 in the first row. Using simple boolean indexing:

>>> B[(B['low'].le(v)) & (B['high'].ge(v))]
high  low name
0     5    1  one

Assuming that DataFrame B does not have any overlapping ranges, then you will get back one row as above. One then uses loc to get the name column, as below. Because each returned name is a a series, you need get the first and only scalar value (using iat, for example).

A['name'] = [B.loc[(B['low'].le(v)) & (B['high'].ge(v)), 'name'].iat[0]
for v in A['value']]

>>> A
value  name
0      3   one
1      7   two
2      5   one
3     18  four
4     23  five
5     27   six
6     21  five
7     29   six