user6215669 user6215669 - 5 months ago 88
Python Question

Replace unique values of pandas data-frame

Hi I'm new to python and pandas.

I have extracted the unique values of one of the column using pandas.
Now after getting the unique values of the column, which are string.

['Others, Senior Management-Finance, Senior Management-Sales'
'Consulting, Strategic planning, Senior Management-Finance'
'Client Servicing, Quality Control - Product/ Process, Strategic
planning'
'Administration/ Facilities, Business Analytics, Client Servicing'
'Sales & Marketing, Sales/ Business Development/ Account Management,
Sales Support']


I want to replace the string values with the unique integer value.

for simplicity I can give you the dummy input and output.

Input:

Col1
A
A
B
B
B
C
C


Unique df value will come as below

[ 'A' 'B' 'C' ]


after replacing the column should look like this

Col1
1
1
2
2
2
3
3


Please suggest me the way how can I do it by using loop or any other way because I have more than
300
unique values.

Answer

Use factorize:

df['Col1'] = pd.factorize(df.Col1)[0] + 1
print (df)
   Col1
0     1
1     1
2     2
3     2
4     2
5     3
6     3

Factorizing values.

Another numpy.unique solution, but slowier in huge dataframe:

_,idx = np.unique(df['Col1'],return_inverse=True) 
df['Col1'] = idx + 1
print (df)
   Col1
0     1
1     1
2     2
3     2
4     2
5     3
6     3

Last you can convert values to categorical - mainly because less memory usage:

df['Col1'] = pd.factorize(df.Col1)[0]
df['Col1'] = df['Col1'].astype("category")
print (df)
  Col1
0    0
1    0
2    1
3    1
4    1
5    2
6    2

print (df.dtypes)
Col1    category
dtype: object
Comments