Giulio - 9 months ago 43

Python Question

im new to python.

Im tring to plot, using matplotlib, the results from linea regression.

I've tried with some basic data and it worked, but when i try with the actual data, the regression line is compltetely wrong. I think im doing something wrong with the fit() or predict() functions.

this is the code :

`import matplotlib.pyplot as plt`

from sklearn import linear_model

import scipy

import numpy as np

regr=linear_model.LinearRegression()

A=[[69977, 4412], [118672, 4093], [127393, 12324], [226158, 15453], [247883, 8924], [228057, 6568], [350119, 4040], [197808, 6793], [205989, 8471], [10666, 632], [38746, 1853], [12779, 611], [38570, 1091], [38570, 1091], [95686, 8752], [118025, 17620], [79164, 13335], [83051, 1846], [4177, 93], [29515, 1973], [75671, 5070], [10077, 184], [78975, 4374], [187730, 17133], [61558, 2521], [34705, 1725], [206514, 10548], [13563, 1734], [134931, 7117], [72527, 6551], [16014, 310], [20619, 403], [21977, 437], [20204, 258], [20406, 224], [20551, 375], [38251, 723], [20416, 374], [21125, 429], [20405, 235], [20042, 431], [20016, 366], [19702, 200], [20335, 420], [21200, 494], [22667, 487], [20393, 405], [20732, 414], [20602, 393], [111705, 7623], [112159, 5982], [6750, 497], [59624, 418], [111468, 10209], [40057, 1484], [435, 0], [498848, 17053], [26585, 1390], [75170, 3883], [139146, 3540], [84931, 7214], [19144, 3125], [31144, 2861], [66573, 818], [114253, 4155], [15421, 2094], [307497, 5110], [484904, 10273], [373476, 36365], [128152, 10920], [517285, 106315], [453483, 10054], [270763, 17542], [9068, 362], [61992, 1608], [35791, 1747], [131215, 6227], [4314, 191], [16316, 2650], [72791, 2077], [47008, 4656], [10853, 1346], [66708, 4855], [214736, 11334], [46493, 4236], [23042, 737], [335941, 11177], [65167, 2433], [94913, 7523], [454738, 12335]]

#my data are selected from a Mysql DB and stored in np array like this one above.

regr.fit(A,A[:,1])

plt.scatter(A[:,0],A[:,1], color='black')

plt.plot(A[:,1],regr.predict(A), color='blue',linewidth=3)

plt.show()

what a want is a regression line using the data from the first column of A and the second column. And this is the result:

I know that the presence of outlier can havily impact on the output , but i tried with other tolls for regression and the regression line was a lot closer to the area where points are, so im sure im missing something.

Thank you.

EDIT 1: as suggested i tried again changing only the plot() param . Instead of A[:,1] i used A[:,0] and this is the result :

A simple example at scikit-learn.org/stable/modules/linear_model.html , looks like mine. I dont need prediction so i didnt sliced my data in training and test set...maybe im misunderstading the meaning of "X,y", but again , looking at the example in the link, it looks like mine.

EDIT 2: finally it worked.

`X=A[:,0]`

X=X[:,np.newaxis]

regr=linear_model.LinearRegression()

regr.fit(X,A[:,1])

plt.plot(X,regr.predict(X))

the X param just need to be a 2 Dim array. The example in EDIT 1 really misleaded me :(.

Answer Source

You seem to be including the target values `A[:, 1]`

in your training data. The fitting command is of the form `regr.fit(X, y)`

.

You also seem to have a problem with this line:

`plt.plot(A[:,1],regr.predict(A), color='blue',linewidth=3)`

I think that should you should replace `A[:, 1]`

with `A[:, 0]`

, if you want to to plot your prediction against the predictor values.

You may find it easier to split your data into `X`

and `y`

at the beginning - it may make things clearer.