PythonNewb - 1 year ago 1810

Python Question

I'm currently working with Pandas and matplotlib to perform some data visualization and I want to add a line of best fit to my scatter plot.

Here is my code:

`import matplotlib`

import matplotlib.pyplot as plt

import pandas as panda

import numpy as np

def PCA_scatter(filename):

matplotlib.style.use('ggplot')

data = panda.read_csv(filename)

data_reduced = data[['2005', '2015']]

data_reduced.plot(kind='scatter', x='2005', y='2015')

plt.show()

PCA_scatter('file.csv')

How do I go about this?

Answer Source

You can use `np.polyfit()`

and `np.poly1d()`

. Estimate a first degree polynomial using the same `x`

values, and add to the `ax`

object created by the `.scatter()`

plot:

```
import numpy as np
ax = data_reduced.plot(kind='scatter', x='2005', y='2015')
z = np.polyfit(x=data_reduced.loc[:, '2005'], y=data_reduced.loc[:, '2015'], deg=1)
p = np.poly1d(z)
trendline = pd.DataFrame(data=p(data_reduced.loc[:, '2005']), index=data_reduced.loc[:, '2005']
trendline.plot(ax=ax)
```

Also provides the the line equation:

```
print('y={0:.2f} x + {1:.2f}'.format(z[0],z[1]))
```