adhg adhg - 2 months ago 7
Python Question

Convert column's data to enumerated dictionary key-value

Is there a better way (in the sense of minimal code) that can do the followings: convert a column to enumerated numerical values so it should go somewhat this way:


  1. get a set of items in a columns

  2. make a enumrated dictionary with key value

  3. revert the key with value

  4. use the key-value result instead of the data in a new column.



So here's what I do today and wonder if anyone can show a classic way to do that so I can avoid writing the function get_color_val:

import pandas as pd
cars = pd.DataFrame({"car_name": ["BMW","BMW","ACCURA","ACCURA","ACCURA","BMW","BMW","BMW"],"color":["RED","RED","RED","RED","GREEN","BLACK","BLUE","BLUE"]})

color_dict = dict(enumerate(set(cars["color"])))
color_dict = dict((y,x) for x,y in color_dict.iteritems())

def get_color_val(row):
my_key = row["color"]
my_value = color_dict.get(my_key)
return my_value

cars["color_val"] = cars.apply(get_color_val, axis=1)
cars = cars.drop("color",1)
print cars



Result


Before------------
car_name color
0 BMW RED
1 BMW RED
2 ACCURA RED
3 ACCURA RED
4 ACCURA GREEN
5 BMW BLACK
6 BMW BLUE
7 BMW BLUE


After------------
car_name color_val
0 BMW 3
1 BMW 3
2 ACCURA 3
3 ACCURA 3
4 ACCURA 2
5 BMW 1
6 BMW 0
7 BMW 0

Answer

I would use pd.factorize() in this case:

In [8]: cars['color_val'] = pd.factorize(cars.color)[0]

In [9]: cars
Out[9]:
  car_name  color  color_val
0      BMW    RED          0
1      BMW    RED          0
2   ACCURA    RED          0
3   ACCURA    RED          0
4   ACCURA  GREEN          1
5      BMW  BLACK          2
6      BMW   BLUE          3
7      BMW   BLUE          3