PythonNewb PythonNewb - 6 months ago 962
Python Question

How to add a line of best fit to Pandas and matplotlib scatter plot

I'm currently working with Pandas and matplotlib to perform some data visualization and I want to add a line of best fit to my scatter plot.

Here is my code:

import matplotlib
import matplotlib.pyplot as plt
import pandas as panda
import numpy as np

def PCA_scatter(filename):

matplotlib.style.use('ggplot')

data = panda.read_csv(filename)
data_reduced = data[['2005', '2015']]

data_reduced.plot(kind='scatter', x='2005', y='2015')
plt.show()

PCA_scatter('file.csv')


How do I go about this?

Answer

You can use np.polyfit() and np.poly1d(). Estimate a first degree polynomial using the same x values, and add to the ax object created by the .scatter() plot:

import numpy as np

ax = data_reduced.plot(kind='scatter', x='2005', y='2015')

z = np.polyfit(x=data_reduced.loc[:, '2005'], y=data_reduced.loc[:, '2015'], deg=1)
p = np.poly1d(z)

trendline = pd.DataFrame(data=p(data_reduced.loc[:, '2005']), index=data_reduced.loc[:, '2005']
trendline.plot(ax=ax)

Also provides the the line equation:

print('y={0:.2f} x + {1:.2f}'.format(z[0],z[1]))