drevicko drevicko - 8 months ago 28
Python Question

Can't access dataframe columns

I'm importing a dataframe from a csv file, but cannot access some of it's columns by name. What's going on?

In more concrete terms:

> import pandas

> jobNames = pandas.read_csv("job_names.csv")
> print(jobNames)

job_id job_name num_judgements
0 933985 Foo 180
1 933130 Moo 175
2 933123 Goo 150
3 933094 Flue 120
4 933088 Tru 120

When I try to access the second column, I get an error:

> jobNames.job_name

AttributeError: 'DataFrame' object has no attribute 'job_name'

Strangely, I can access the job_id column thus:

> print(jobNames.job_id)

0 933985
1 933130
2 933123
3 933094
4 933088
Name: job_id, dtype: int64

Edit (to put the accepted answer in context):

It turns out that the first row of the csv file (with the column names) looks like this:

job_id, job_name, num_judgements

Note the spaces after each comma! Those spaces are retained in the column names:

> jobNames.columns[1]

' job_name'

which don't form valid python identifiers, so those columns aren't available as dataframe attributes. I can still access them dict-style:

> jobNames[' job_name']


When using pandas.read_csv pass in skipinitialspace=True flag to remove whitespace after CSV delimiters.