Raphadasilva Raphadasilva - 1 month ago 5
Python Question

Correspondence table with Pandas

Recently, I used pandas to manipulate a csv with electoral data. In my DataFrame, I've got a column 'winner' for each city of an area.

It could be represented as :

city winner
0 city1 party1
1 city2 party3
2 city3 party1
3 city4 party2
4 city5 party1
...
5188 city5189 party3


Here's the thing: I want to create a new column, named "coulour". The goal is to have an unique colour per city, depending on 'winner' value.

To do that I could use if/elif statements, but I was wondering if it's posible to use a correspondence table. For example, if I've got my huge DataFrame on one hand, and this little one on the other :

winner colour
0 party1 #000
1 party2 #fff
2 party3 #c0c0c0


Could I use one fonction of pandas to match display the good value of 'colour' if values of 'winner' are matching in the two DataFrames ?

I've tried map() function and get_loc(), but it didn't work very well.

Answer

You can use map by Series created from df2:

print (df2.set_index('winner')['colour'])
winner
party1       #000
party2       #fff
party3    #c0c0c0
Name: colour, dtype: object

df1['new'] = df1.winner.map(df2.set_index('winner')['colour'])
print (df1)
          city  winner      new
0        city1  party1     #000
1        city2  party3  #c0c0c0
2        city3  party1     #000
3        city4  party2     #fff
4        city5  party1     #000
5188  city5189  party3  #c0c0c0

Another solution with merge (on='winner' can be omit if only common column in both DataFrame is winner):

print (pd.merge(df1,df2, how='left'))
       city  winner   colour
0     city1  party1     #000
1     city2  party3  #c0c0c0
2     city3  party1     #000
3     city4  party2     #fff
4     city5  party1     #000
5  city5189  party3  #c0c0c0