Trexion Kameha Trexion Kameha - 1 month ago 8
Python Question

Python: Lookup value between two other columns in another dataframe

I have a dataframe with IP addresses:

df_users.head(5)
user_id ip_address
0 22058 7.327584e+08
1 333320 3.503114e+08
2 1359 2.621474e+09
3 150084 3.840542e+09
4 221365 4.155831e+08


I have a lookup dataframe where the ip_address as a low and a high, mapped to a country:

df_ip.head(5)
lower_bound_ip_address upper_bound_ip_address country
0 16777216.0 16777471 Australia
1 16777472.0 16777727 China
2 16777728.0 16778239 China
3 16778240.0 16779263 Australia
4 16779264.0 16781311 China


I would like to add 'country' into the above dataframe as a new column by looking up the ip_address between the lower and upper bound.

ip_data=df_fraud.ip_address.head(5)
k=[]
for i in range(len(ip_data)):
ip=ip_data[i]
tmp=df_ip[(df_ip['lower_bound_ip_address']<=ip) & (df_ip['upper_bound_ip_address']>=ip)]
tmp=tmp['country'].tolist()
k.append(tmp)
df_user['country']=k


However this returns:

user_id ip_address country
0 22058 7.327584e+08 [Japan]
1 333320 3.503114e+08 [United States]
2 1359 2.621474e+09 [United States]
3 150084 3.840542e+09 []
4 221365 4.155831e+08 [United States]


The country column has those brackets in there. What am I doing wrong here?

Anyone have any idea what I am doing wrong? In SQL this is a very easy join with "between" as the operator. Thank you.

Answer

apply is my usual go to solution in cases like this:

def lookup_country(ip):

    return df_ip[(df_ip['lower_bound_ip_address']<ip) & (df_ip['upper_bound_ip_address']>=ip)]['country']

df_users['country'] = df_users.apply(lambda row: lookup_country(row['ip_address']), axis=1)

There is probably a join, merge or concatenation solution to this, but this is the more obvious way to solve the problem, I think.

Note: this solution relies on the conditions returning only a single entry.