ahajib ahajib - 1 month ago 19
Python Question

Normalize columns of pandas data frame

I have a data frame in pandas in which each column has different value range. For example:

df:

A B C
1000 10 0.5
765 5 0.35
800 7 0.09


Any idea how I can normalize the columns of this data frame where each value is between 0 and 1?

My desired output is:

A B C
1 1 1
0.765 0.5 0.7
0.8 0.7 0.18(which is 0.09/0.5)

Answer

You can use the package sklearn and its associated preprocessing utilities to normalize the data.

from sklearn import preprocessing

x = df.values #returns a numpy array
min_max_scaler = preprocessing.MinMaxScaler()
x_scaled = min_max_scaler.fit_transform(x)
df = pandas.DataFrame(x_scaled)

For more information look at the documentation: http://scikit-learn.org/stable/modules/preprocessing.html#scaling-features-to-a-range

Comments