hkhr hkhr - 3 months ago 12
Python Question

having trouble to write corresponding code using sklearn with respect to graphlab create mainly unable to plot properly

Finding very much trouble to plot a graph for crimerate vs houseprice.
with graphlab lib it is easy to do but using sklearn i am unable to do it.
here is my code w.r.t sklearn

import sklearn
import sframe
from sframe import SFrame
import pandas as pd


# #Load some house value vs. crime rate data
#
# Dataset is from Philadelphia, PA and includes average house sales price in a number of neighborhoods. The attributes of each neighborhood we have include the crime rate ('CrimeRate'), miles from Center City ('MilesPhila'), town name ('Name'), and county name ('County').


sales = pd.read_csv('Philadelphia_Crime_Rate_noNA.csv')


sales[:2]


# #Exploring the data

# The house price in a town is correlated with the crime rate of that town. Low crime towns tend to be associated with higher house prices and vice versa.

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

plt.scatter(x=sales['CrimeRate'], y=sales['HousePrice'])

crime_model =sklearn.linear_model.LinearRegression()


# #Let's see what our fit looks like

X=sales.drop(['Name','County'],axis=1)
X=X.dropna()

crime_rate=X['CrimeRate']
price=X['HousePrice']

crime_model.fit(crime_rate.reshape(-1,1),X.HousePrice)

plt.plot(X['CrimeRate'],X['HousePrice'],'.',
X['CrimeRate'],crime_model.predict(X),'-')


The output which i am getting
using sklearn environment(not proper)

The output which i am looking for is
It could be done using the Graphlab create environment

This is the complete code which runs properly with graphlab create

import graphlab

sales = graphlab.SFrame.read_csv('Philadelphia_Crime_Rate_noNA.csv')

sales

graphlab.canvas.set_target('ipynb')
sales.show(view="Scatter Plot", x="CrimeRate", y="HousePrice")

crime_model = graphlab.linear_regression.create(sales, target='HousePrice', features=['CrimeRate'],validation_set=None,verbose=False)

import matplotlib.pyplot as plt
%matplotlib inline


# In[25]:

plt.plot(sales['CrimeRate'],sales['HousePrice'],'.',
sales['CrimeRate'],crime_model.predict(sales),'-')


# Above: blue dots are original data, green line is the fit from the simple regression.


Hope someone can point my mistake.
Thanks.

This is the Dataset

HousePrice HsPrc CrimeRate MilesPhila PopChg Name County
140463 14.0463 29.7 10 -1 Abington Montgome
113033 11.3033 24.1 18 4 Ambler Montgome
124186 12.4186 19.5 25 8 Aston Delaware
110490 11.049 49.4 25 2.7 Bensalem Bucks
79124 7.9124 54.1 19 3.9 Bristol B. Bucks
92634 9.2634 48.6 20 0.6 Bristol T. Bucks
89246 8.9246 30.8 15 -2.6 Brookhaven Delaware
195145 19.5145 10.8 20 -3.5 Bryn Athyn Montgome
297342 29.7342 20.2 14 0.6 Bryn Mawr Montgome
264298 26.4298 20.4 26 6 Buckingham Bucks
134342 13.4342 17.3 31 4.2 Chalfont Bucks
147600 14.76 50.3 9 -1 Cheltenham Montgome
77370 7.737 34.2 10 -1.2 Clifton Delaware
170822 17.0822 33.7 32 2.4 Collegeville Montgome
40642 4.0642 45.7 15 0 Darby Bor. Delaware
71359 7.1359 22.3 8 1.6 Darby Town Delaware
104923 10.4923 48.1 21 6.9 Downingtown Chester
190317 19.0317 19.4 26 1.9 Doylestown Bucks
215512 21.5512 71.9 26 5.8 E. Bradford Chester
178105 17.8105 45.1 25 2.3 E. Goshen Chester
131025 13.1025 31.3 19 -1.8 E. Norriton Montgome
149844 14.9844 24.9 22 6.4 E. Pikeland Chester
170556 17.0556 27.2 30 4.6 E. Whiteland Chester
280969 28.0969 17.7 14 2.9 Easttown Chester
114233 11.4233 29 30 1.3 Falls Town Bucks
74502 7.4502 21.4 15 -3.2 Follcroft Delaware
475112 47.5112 28.6 12 Gladwyne Montgome
97167 9.7167 29.3 10 0.2 Glenolden Delaware
114572 11.4572 17.5 20 5.2 Hatboro Montgome
436348 43.6348 16.5 10 -0.7 Haverford Delaware
389302 38.9302 17.8 20 1.5 Horsham Montgome
122392 12.2392 17.3 10 1.9 Jenkintown Montgome
130436 13.0436 31.2 17 -0.4 L Southampton Delaware
272790 27.279 14.5 20 -5.1 L. Gwynedd Montgome
194435 19.4435 15.7 32 15 L. Makefield Bucks
299621 29.9621 28.6 10 1.4 L. Merion Montgome
210884 21.0884 20.8 20 0.1 L. Moreland Montgome
112471 11.2471 29.3 35 3.4 Lansdale Montgome
93738 9.3738 19.3 7 -0.4 Lansdown Delaware
121024 12.1024 39.5 35 26.9 Limerick Montgome
156035 15.6035 13 23 6.3 Malvern Chester
185404 18.5404 24.1 10 0.9 Marple Delaware
126160 12.616 38 20 -2.4 Media Delaware
143072 14.3072 40.1 23 1.6 Middletown Bucks
96769 9.6769 36.1 15 5.1 Morrisville Bucks
94014 9.4014 26.6 14 0.5 Morton Delaware
118214 11.8214 25.1 25 5.7 N. Wales Montgome
157446 15.7446 14.6 15 3.1 Narberth Montgome
150283 15.0283 18.2 15 0.9 Nether Delaware
153842 15.3842 15.3 23 8.5 Newtown Bucks
197214 19.7214 15.2 25 2.1 Newtown B. Bucks
206127 20.6127 17.4 15 2.7 Newtown T. Delaware
71981 7.1981 73.3 19 4.9 Norristown Montgome
169401 16.9401 7.1 22 1.5 Northampton Bucks
99843 9.9843 12.5 12 -3.7 Norwood Delaware
60000 6 45.8 18 -1.4 Phila, Far NE Phila
28000 2.8 44.9 5.5 -8.4 Phila, N Phila
60000 6 65 9 -4.9 Phila, NE Phila
61800 6.18 49.9 9 -6.4 Phila, NW Phila
38000 3.8 54.8 4.5 -5.1 Phila, SW Phila
38000 3.8 53.5 2 -9.2 Phila, South Phila
42000 4.2 69.9 4 -5.7 Phila, West Phila
96200 9.62 366.1 0 4.8 Phila,CC Phila
103087 10.3087 24.6 24 3.9 Phoenixville Chester
147720 14.772 58.6 25 1.5 Plymouth Montgome
78175 7.8175 53.2 41 2.2 Pottstown Montgome
92215 9.2215 17.4 14 7.8 Prospect Park Delaware
271804 27.1804 15.5 17 1.2 Radnor Delaware
119566 11.9566 14.5 12 -2.9 Ridley Park Delaware
100231 10.0231 24.1 15 1.9 Ridley Town Delaware
95831 9.5831 21.2 32 3.2 Royersford Montgome
229711 22.9711 9.8 22 5.3 Schuylkill Chester
74308 7.4308 29.9 7 1.8 Sharon Hill Delaware
259506 25.9506 7.2 40 17.4 Solebury Bucks
159573 15.9573 19.4 15 -2.1 Springfield Montgome
147176 14.7176 41.1 12 -1.7 Springfield Delaware
205732 20.5732 11.2 12 -0.2 Swarthmore Delaware
215783 21.5783 21.2 20 1.1 Tredyffin Chester
116710 11.671 42.8 20 12.9 U. Chichester Delaware
359112 35.9112 9.4 36 4 U. Makefield Bucks
189959 18.9959 61.7 22 -2.1 U. Merion Montgome
133198 13.3198 19.4 22 -2 U. Moreland Montgome
242821 24.2821 6.6 21 1.6 U. Providence Delaware
142811 14.2811 15.9 20 -1.6 U. Southampton Bucks
200498 20.0498 18.8 36 11 U. Uwchlan Chester
199065 19.9065 13.2 20 7.8 Upper Darby Montgome
93648 9.3648 34.5 8 -0.7 Upper Darby Delaware
163001 16.3001 22.1 50 8 Uwchlan T. Chester
436348 43.6348 22.1 15 1.3 Villanova Montgome
124478 12.4478 71.9 22 4.6 W. Chester Chester
168276 16.8276 31.9 26 5.9 W. Goshen Chester
114157 11.4157 44.6 38 14.6 W. Whiteland Chester
130088 13.0088 28.6 19 -0.2 Warminster Bucks
152624 15.2624 24 19 23.1 Warrington Bucks
174232 17.4232 13.8 25 4.7 Westtown Chester
196515 19.6515 29.9 16 1.8 Whitemarsh Montgome
232714 23.2714 9.9 21 0.2 Willistown Chester
245920 24.592 22.6 10 0.3 Wynnewood Montgome
130953 13.0953 13 24 5.2 Yardley Bucks

Answer

plt.plot(X['CrimeRate'],X['HousePrice'],'.', X['CrimeRate'],crime_model.predict(X),'-')

I made a mistake above i am suppose to give input as X['CrimeRate'] for predict but i have given (X) so i replaced with X['CrimeRate'] and now it is working properly.

Proper one is

plt.plot(X['CrimeRate'],X['HousePrice'],'.',
    X['CrimeRate'],crime_model.predict(X['CrimeRate']),'-')
Comments