Richard Richard - 2 months ago 11
Python Question

Why does `set_index` create an index label for the column name?

I have a CSV file which begins like this:

Year,Boys,Girls
1996,333490,315995
1997,329577,313518
1998,325903,309998


When I read it into pandas and set an index, it isn't doing quite what I expect:

df = pd.read_csv('../data/myfile.csv')
df.set_index('Year', inplace=True)
df.head()


Why is there an index entry for the column label, with blank values next to it? Shouldn't this simply disappear?

enter image description here

Also, I'm not clear on how to retrieve the values for 1998. If I try
df.loc['1998']
I get an error:
KeyError: 'the label [1998] is not in the [index]'
.

Answer

You should set the name attribute of your index to None:

df.index.names = [None]
df.head()
#       Boys    Girls
#1996   333490  315995
#1997   329577  313518
#1998   325903  309998

As for retrieving the data for 1998, simply lose the quotes:

df.loc[1998]
#Boys     325903
#Girls    309998
#Name: 1998, dtype: int64