Dhruv Ghulati Dhruv Ghulati - 4 months ago 93
Python Question

Can't set index of a pandas data frame - getting "KeyError"

I generate a data frame that looks like this (

summaryDF
):

accuracy f1 precision recall
0 0.494 0.722433 0.722433 0.722433
0 0.290 0.826087 0.826087 0.826087
0 0.274 0.629630 0.629630 0.629630
0 0.278 0.628571 0.628571 0.628571
0 0.288 0.718750 0.718750 0.718750
0 0.740 0.740000 0.740000 0.740000
0 0.698 0.765133 0.765133 0.765133
0 0.582 0.778547 0.778547 0.778547
0 0.682 0.748235 0.748235 0.748235
0 0.574 0.767918 0.767918 0.767918
0 0.398 0.711656 0.711656 0.711656
0 0.530 0.780083 0.780083 0.780083


Because I know what each row in this should be, I then am using this code to set the names of each row (these aren't the actual row names but just for argument's sake).

summaryDF = summaryDF.set_index(['A','B','C', 'D','E','F','G','H','I','J','K','L'])


However, I am getting:

level = frame[col].values
File "/Users/me/anaconda/lib/python2.7/site-packages/pandas/core/frame.py", line 1797, in __getitem__
return self._getitem_column(key)
File "/Users/me/anaconda/lib/python2.7/site-packages/pandas/core/frame.py", line 1804, in _getitem_column
return self._get_item_cache(key)
File "/Users/me/anaconda/lib/python2.7/site-packages/pandas/core/generic.py", line 1084, in _get_item_cache
values = self._data.get(item)
File "/Users/me/anaconda/lib/python2.7/site-packages/pandas/core/internals.py", line 2851, in get
loc = self.items.get_loc(item)
File "/Users/me/anaconda/lib/python2.7/site-packages/pandas/core/index.py", line 1572, in get_loc
return self._engine.get_loc(_values_from_object(key))
File "pandas/index.pyx", line 134, in pandas.index.IndexEngine.get_loc (pandas/index.c:3824)
File "pandas/index.pyx", line 154, in pandas.index.IndexEngine.get_loc (pandas/index.c:3704)
File "pandas/hashtable.pyx", line 686, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12280)
File "pandas/hashtable.pyx", line 694, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12231)
KeyError: 'A'


I have no idea what I am doing wrong and have researched far and wide. Any ideas?

Answer

You need assign list to summaryDF.index, if length of list is same as length of DataFrame:

summaryDF.index = ['A','B','C', 'D','E','F','G','H','I','J','K','L']
print (summaryDF)
   accuracy        f1  precision    recall
A     0.494  0.722433   0.722433  0.722433
B     0.290  0.826087   0.826087  0.826087
C     0.274  0.629630   0.629630  0.629630
D     0.278  0.628571   0.628571  0.628571
E     0.288  0.718750   0.718750  0.718750
F     0.740  0.740000   0.740000  0.740000
G     0.698  0.765133   0.765133  0.765133
H     0.582  0.778547   0.778547  0.778547
I     0.682  0.748235   0.748235  0.748235
J     0.574  0.767918   0.767918  0.767918
K     0.398  0.711656   0.711656  0.711656
L     0.530  0.780083   0.780083  0.780083

print (summaryDF.index)
Index(['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L'], dtype='object')

Timings:

In [117]: %timeit summaryDF.index = ['A','B','C', 'D','E','F','G','H','I','J','K','L']
The slowest run took 6.86 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 76.2 µs per loop

In [118]: %timeit summaryDF.set_index(pd.Index(['A','B','C', 'D','E','F','G','H','I','J','K','L']))
The slowest run took 6.77 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 227 µs per loop

Another solution is convert list to numpy array:

summaryDF.set_index(np.array(['A','B','C', 'D','E','F','G','H','I','J','K','L']), inplace=True)
print (summaryDF)
   accuracy        f1  precision    recall
A     0.494  0.722433   0.722433  0.722433
B     0.290  0.826087   0.826087  0.826087
C     0.274  0.629630   0.629630  0.629630
D     0.278  0.628571   0.628571  0.628571
E     0.288  0.718750   0.718750  0.718750
F     0.740  0.740000   0.740000  0.740000
G     0.698  0.765133   0.765133  0.765133
H     0.582  0.778547   0.778547  0.778547
I     0.682  0.748235   0.748235  0.748235
J     0.574  0.767918   0.767918  0.767918
K     0.398  0.711656   0.711656  0.711656
L     0.530  0.780083   0.780083  0.780083