s5s s5s - 24 days ago 9
Python Question

merge_asof equivalent using pandas merge only

I have the following 2 DataFrames. A has the data and B has the weights for the data. B has the dates on which the weights become active as the index and 'level_1' has the entity for which the weight is relevant.

A = pd.DataFrame(index=pd.date_range(start='2016-01-15', periods=10, freq='B'))
B = pd.DataFrame(index=pd.date_range(start='2016-01-01', periods=5, freq='W'))
A["X"] = np.random.rand(A.shape[0])
A["Y"] = np.random.rand(A.shape[0])
A["Z"] = np.random.rand(A.shape[0])


B["X"] = np.random.rand(B.shape[0])
B["Y"] = np.random.rand(B.shape[0])
B["Z"] = np.random.rand(B.shape[0])

A = A.stack(dropna=False).reset_index(level=1)
B = B.stack(dropna=False).reset_index(level=1)


I want to end up with something like this at the end (which is equivalent to A with the weights in B applied in the 'weight' column)

level_1 0 weight
2016-01-15 X 0.456115 0.853601
2016-01-15 Y 0.849457 0.420065
2016-01-15 Z 0.216661 0.767833
2016-01-18 X 0.434117 0.165397
2016-01-18 Y 0.914670 0.338894
2016-01-18 Z 0.107617 0.130946
2016-01-19 X 0.575688 0.165397
2016-01-19 Y 0.147352 0.338894
2016-01-19 Z 0.950299 0.130946
2016-01-20 X 0.637323 0.165397
2016-01-20 Y 0.793915 0.338894
2016-01-20 Z 0.072430 0.130946
2016-01-21 X 0.991307 0.165397
2016-01-21 Y 0.685731 0.338894
2016-01-21 Z 0.889280 0.130946
2016-01-22 X 0.671650 0.165397
2016-01-22 Y 0.550121 0.338894
2016-01-22 Z 0.682943 0.130946
2016-01-25 X 0.660611 0.245470
2016-01-25 Y 0.146163 0.277813
2016-01-25 Z 0.219939 0.552831
2016-01-26 X 0.288662 0.245470
2016-01-26 Y 0.871490 0.277813
2016-01-26 Z 0.340407 0.552831
2016-01-27 X 0.416167 0.245470
2016-01-27 Y 0.274234 0.277813
2016-01-27 Z 0.686227 0.552831
2016-01-28 X 0.275956 0.245470
2016-01-28 Y 0.804975 0.277813
2016-01-28 Z 0.664723 0.552831


What trips me up is that the indices in B are not (or not necessarily) in A which means I cannot just add the data in B to A under a column then do A['weights'].fillna(method='ffill'). I can loop over I guess but this is slow and messy.

Answer

I think you need merge_asof:

np.random.seed(1234)
A = pd.DataFrame(index=pd.date_range(start='2016-01-15', periods=10, freq='B'))
B = pd.DataFrame(index=pd.date_range(start='2016-01-01', periods=5, freq='W'))
A["X"] = np.random.rand(A.shape[0])
A["Y"] = np.random.rand(A.shape[0])
A["Z"] = np.random.rand(A.shape[0])


B["X"] = np.random.rand(B.shape[0])
B["Y"] = np.random.rand(B.shape[0])
B["Z"] = np.random.rand(B.shape[0])

A = A.stack(dropna=False).reset_index(level=1)
B = B.stack(dropna=False).reset_index(level=1)

#print (A)
#print (B)
print (pd.merge_asof(A.reset_index(),
                     B.reset_index().rename(columns={0:'weight'}), on='index', by='level_1'))

       index level_1         0    weight
0  2016-01-15       X  0.191519  0.436173
1  2016-01-15       Y  0.357817  0.218792
2  2016-01-15       Z  0.364886  0.184287
3  2016-01-18       X  0.622109  0.802148
4  2016-01-18       Y  0.500995  0.924868
5  2016-01-18       Z  0.615396  0.047355
6  2016-01-19       X  0.437728  0.802148
7  2016-01-19       Y  0.683463  0.924868
8  2016-01-19       Z  0.075381  0.047355
9  2016-01-20       X  0.785359  0.802148
10 2016-01-20       Y  0.712702  0.924868
11 2016-01-20       Z  0.368824  0.047355
12 2016-01-21       X  0.779976  0.802148
13 2016-01-21       Y  0.370251  0.924868
14 2016-01-21       Z  0.933140  0.047355
15 2016-01-22       X  0.272593  0.802148
16 2016-01-22       Y  0.561196  0.924868
17 2016-01-22       Z  0.651378  0.047355
18 2016-01-25       X  0.276464  0.143767
19 2016-01-25       Y  0.503083  0.442141
20 2016-01-25       Z  0.397203  0.674881
21 2016-01-26       X  0.801872  0.143767
22 2016-01-26       Y  0.013768  0.442141
23 2016-01-26       Z  0.788730  0.674881
24 2016-01-27       X  0.958139  0.143767
25 2016-01-27       Y  0.772827  0.442141
26 2016-01-27       Z  0.316836  0.674881
27 2016-01-28       X  0.875933  0.143767
28 2016-01-28       Y  0.882641  0.442141
29 2016-01-28       Z  0.568099  0.674881
Comments