Clip Clip - 1 month ago 7
Python Question

Pandas ValueError: too many values to unpack np.polyfit

I am trying to do a polynomial regression on some data I have. I have been able to successfully call

m, b = np.polyfit(x.values.flatten(), y.values.flatten(), 1)


However when I increase the degree to anything higher than 1, then I get the following error:

np.polyfit(x.values.flatten(), y.values.flatten(), 2)
ValueError: too many values to unpack


I am reading data as:

x = pandas.read_csv('D3.csv', usecols = [0])
y = pandas.read_csv('D3.csv', usecols = [3])


Any idea why this might be happening?

Answer

Before touching your problem, by documentation, np.polyfit returns only 2 arguments:

  1. an array of coefficients for the fitted polynomial
  2. the covariance matrix for these coefficients

Now, the second argument appears only when you call np.polyfit with two optional arguments: full = False and cov=True.

Back to your problem: since you do not set cov = True (by default, full=False), your function only returns a single array. When you set the degree to 1, the returned array has two elements ([a,c] in ax + c) and python automatically assigns m = a and b = c in your code. When going to higher degree, the returned array has more than two elements, and python tries to assign them into all the variables you put before the assignment, but since there are more than 2, you will need more than 2 variables there. I.e. Consider this small run in python 2:

>>> a,c = [1,2]
>>> a
1
>>> c
2
>>> a,c = [1,2,3]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: too many values to unpack

Now, the correct way to do this is just set a single returned value. Example:

import pandas
import numpy as np
import sys

deg = int(sys.argv[1])

x = pandas.read_csv('test.csv',usecols=[0])
y = pandas.read_csv('test.csv',usecols=[1])
m = np.polyfit(x.values.flatten(), y.values.flatten(), deg)

print m

Assume that code is in file stackoverflow.py. Sample calls:

Chip chip@ 09:15:48@ ~: python stackoverflow.py 1
[  8. -12.]
Chip chip@ 09:15:51@ ~: python stackoverflow.py 2
[  1.00000000e+00  -1.45528372e-14   3.11727844e-14]
Chip chip@ 09:15:56@ ~: python stackoverflow.py 3
[ -3.16472437e-16   1.00000000e+00  -2.82243562e-14   4.43123853e-14]

In other words, all the coefficients are packed into a single array and returned.

Comments