Don Smythe Don Smythe - 3 months ago 45
Python Question

pandas convert text feature to numeric value

I can convert all text features in a pandas dataframe by casting to 'category' using the df.astype() method as below. However I find category hard to work with (eg for plotting data) and would prefer to create a new column of integers

#convert all objects to categories
object_types = dataset.select_dtypes(include=['O'])
for col in object_types:
dataset['{0}_category'.format(col)] = dataset[col].astype('category')

I can convert the text to integers using this hack:

#convert all objects to int values
object_types = dataset.select_dtypes(include=['O'])

new_cols = {}
for col in object_types:
data_set = set(dataset[col].tolist())
data_indexed = {}
for i, item in enumerate(data_set):
data_indexed[item] = i
new_list = []
for item in dataset[col].tolist():

for key, val in new_cols.items():
dataset['{0}_int_value'.format(key)] = val

But is there a better (or existing) way to do the same?


consider df

df = pd.DataFrame(dict(A=list('aaaabbbbcccc'),


enter image description here

you can convert to integers like this

def intify(s):
    u = np.unique(s)
    i = np.arange(len(u))
    return, i)))

or shorter version

def intify(s):
    u = np.unique(s)
    return{k: i for i, k in enumerate(u)})


Or in a single line

df.apply(lambda s:{k:i for i,k in enumerate(s.unique())}))

enter image description here