Night Walker Night Walker - 4 months ago 9
Python Question

keys of first data frame that appear in second one and flag that fact

I have two Data Frames:

data = {'year': ['11:23:19', '11:23:19', '11:24:19', '11:25:19', '11:25:19', '11:23:19', '11:23:19', '11:23:19', '11:23:19', '11:23:19'],
'store_number': ['1944', '1945', '1946', '1948', '1948', '1949', '1947', '1948', '1949', '1947'],
'retailer_name': ['Walmart', 'Walmart', 'CRV', 'CRV', 'CRV', 'Walmart', 'Walmart', 'CRV', 'CRV', 'CRV'],
'amount': [5, 5, 8, 6, 1, 5, 10, 6, 12, 11],
'id': [10, 10, 11, 11, 11, 10, 10, 11, 11, 10]}

df = pd.DataFrame(data, columns=['retailer_name', 'store_number', 'year', 'amount', 'id'])
df.set_index(['retailer_name', 'store_number', 'year'], inplace=True)


And the second one:

data2 = {'year': ['11:23:19', '11:23:19', '13:23:19'],
'store_number': [1944, 1947, 1978],
'retailer_name': ['Walmart', 'CRV', 'CRV12'],
'amount': [5, 11, 11]}
df2 = pd.DataFrame(data2, columns=['retailer_name', 'store_number', 'year', 'amount'])
df2.set_index(['retailer_name', 'store_number', 'year'], inplace=True)

amount id
retailer_name store_number year
Walmart 1944 11:23:19 5 10
1945 11:23:19 5 10
CRV 1946 11:24:19 8 11
1948 11:25:19 6 11
11:25:19 1 11
Walmart 1949 11:23:19 5 10
1947 11:23:19 10 10
CRV 1948 11:23:19 6 11
1949 11:23:19 12 11
1947 11:23:19 11 10


amount
retailer_name store_number year
Walmart 1944 11:23:19 5
CRV 1947 11:23:19 11
CRV12 1978 13:23:19 11


How I can check the keys of df2 that appear in df, and flag 1 on those that do appear and 0 if not:

amount flag
retailer_name store_number year
Walmart 1944 11:23:19 5 1
CRV 1947 11:23:19 11 1
CRV12 1978 13:23:19 11 0

Answer

you can use MultiIndex.intersection() method if you make sure that both multiindexes have the same dtypes:

In [74]: df2['flag'] = 0

In [75]: df2.ix[df2.index.intersection(df.index), 'flag'] = 1
c:\envs\py35\lib\site-packages\IPython\terminal\ipapp.py:344: PerformanceWarning: indexing past lexsort depth may impact performance.
  self.shell.mainloop()

In [76]: df2
Out[76]:
                                     amount  flag
retailer_name store_number year
Walmart       1944         11:23:19       5     1
CRV           1947         11:23:19      11     1
CRV12         1978         13:23:19      11     0

NOTE: it won't work with your sample DFs because column store_number has different dtypes: string in df and int in df2

Comments