caposeidon - 6 months ago 82

Python Question

I am currently using Scikit-Learn's LogisticRegression to build a model. I have used

`from sklearn import preprocessing`

scaler=preprocessing.StandardScaler().fit(build)

build_scaled = scaler.transform(build)

to scale all of my input variables prior to training the model. Everything works fine and produces a decent model, but my understanding is the coefficients produced by LogisticRegression.coeff_ are based on the scaled variables. Is there a transformation to those coefficients that can be used to adjust them to produce coefficients that can be applied to the non-scaled data?

I am thinking forward to am implementation of the model in a productionized system, and attempting to determine if all of the variables need to be pre-processed in some way in production for scoring of the model.

Note: the model will likely have to be re-coded within the production environment and the environment is not using python.

Answer

Short answer, to get LogisticRegression coefficients and intercept for unscaled data (assuming binary classification, and lr is a trained LogisticRegression object):

you must divide your coefficient array element wise by the (since v0.17) scaler.scale_ array:

`coefficients = np.true_divide(lr.coeff_, scaler.scale_)`

you must subtract from your intercept the inner product of the resulting coefficients (the division result) array with the scaler.mean_ array:

`intercept = lr.intercept_ - np.dot(coefficients, scaler.mean_)`

you can see why the above needs to be done, if you think that every feature is normalized by substracting from it its mean (stored in the scaler.mean_ array) and then dividing it by its standard deviation (stored in the scaler.scale_ array).