Fagui Curtain Fagui Curtain - 6 months ago 38
Python Question

Pandas merge doesn't preserve sorting order

I am trying the following:

PoliceStations_raw=pd.DataFrame(
[['BAYVIEW' ,37.729732,-122.397981],
['CENTRAL' ,37.798732,-122.409919],
['INGLESIDE' ,37.724676,-122.446215],
['MISSION' ,37.762849,-122.422005],
['NORTHERN' ,37.780186,-122.432467],
['PARK' ,37.767797,-122.455287],
['RICHMOND' ,37.779928,-122.464467],
['SOUTHERN' ,37.772380,-122.389412],
['TARAVAL' ,37.743733,-122.481500],
['TENDERLOIN',37.783674,-122.412899]],columns=['PdDistrict','XX','YY'])


df1=pd.DataFrame([[0,'CENTRAL'],[1,'TARAVAL'],[3,'CENTRAL'],[2,'BAYVIEW']])
df1.columns = ['Index','PdDistrict']


Index PdDistrict
0 0 CENTRAL
1 1 TARAVAL
2 3 CENTRAL
3 2 BAYVIEW


despite having typed in
sort=False
, the returned object has merged the table but is using
PdDistrict
as some Index and has changed the order of the rows of the original left dataframe.

pd.merge(df1,PoliceStations_raw,sort=False)


returns this (note that the order of
PdDistrict
has changed)

Index PdDistrict XX YY
0 0 CENTRAL 37.798732 -122.409919
1 3 CENTRAL 37.798732 -122.409919
2 1 TARAVAL 37.743733 -122.481500
3 2 BAYVIEW 37.729732 -122.397981

Answer

You need to specify how you want the two dataframes merged. By default, an inner join is emulated by merge(). However, by specifying you want a left join, the sort order from df1 is preserved. Thus, you simply need to add on='left':

>>> pd.merge(df1, PoliceStations_raw, how='left')
   Index PdDistrict         XX          YY
0      0    CENTRAL  37.798732 -122.409919
1      1    TARAVAL  37.743733 -122.481500
2      3    CENTRAL  37.798732 -122.409919
3      2    BAYVIEW  37.729732 -122.397981

Additionally, sort=False is the default behavior -- you don't need to specify that.