KatieRose1029 KatieRose1029 - 10 days ago 4
Python Question

Filtering Values in Pandas Dataframe or Series

I am trying to filter values from a column in a pandas Dataframe but I seem to be receiving booleans instead of actual values.
I am trying to filter our data by month and year.
In the code following below you will see I am only filtering out by year but I have tried month and year multiple times in different ways:

In [1]: import requests

In [2]: import pandas as pd # pandas

In [3]: import datetime as dt # module for manipulating dates and times

In [4]: url = "http://elections.huffingtonpost.com/pollster/2012-general-election-romney-vs-obama.csv"

In [5]: source = requests.get(url).text

In [6]: from io import StringIO, BytesIO

In [7]: s = StringIO(source)

In [8]: election_data = pd.DataFrame.from_csv(s, index_col=None).convert_objects(convert_dates="coerce", convert_numeric=True)


In [9]: election_data.head(n=3)
Out[9]:
Pollster Start Date End Date Entry Date/Time (ET) \
0 Politico/GWU/Battleground 2012-11-04 2012-11-05 2012-11-06 08:40:26
1 YouGov/Economist 2012-11-03 2012-11-05 2012-11-26 15:31:23
2 Gravis Marketing 2012-11-03 2012-11-05 2012-11-06 09:22:02

Number of Observations Population Mode Obama Romney \
0 1000.0 Likely Voters Live Phone 47.0 47.0
1 740.0 Likely Voters Internet 49.0 47.0
2 872.0 Likely Voters Automated Phone 48.0 48.0

Undecided Other Pollster URL \
0 6.0 NaN http://elections.huffingtonpost.com/pollster/p...
1 3.0 NaN http://elections.huffingtonpost.com/pollster/p...
2 4.0 NaN http://elections.huffingtonpost.com/pollster/p...

Source URL Partisan Affiliation \
0 http://www.politico.com/news/stories/1112/8338... Nonpartisan None
1 http://cdn.yougov.com/cumulus_uploads/document... Nonpartisan None
2 http://www.gravispolls.com/2012/11/gravis-mark... Nonpartisan None

Question Text Question Iteration
0 NaN 1
1 NaN 1
2 NaN 1

In [10]: start_date = pd.Series(election_data["Start Date"])
...: start_date.head(n=3)
...:
Out[10]:
0 2012-11-04
1 2012-11-03
2 2012-11-03
Name: Start Date, dtype: datetime64[ns]

In [11]: filtered = start_date.map(lambda x: x.year == 2012)

In [12]: filtered
Out[12]:
0 True
1 True
2 True
...
587 False
588 False
589 False
Name: Start Date, dtype: bool

Answer

I think you need read_csv with url address first and then boolean indexing with mask created by year and month:

election_data = pd.read_csv('http://elections.huffingtonpost.com/pollster/2012-general-election-romney-vs-obama.csv', parse_dates=[1,2,3])

print (election_data.head(3))
                    Pollster Start Date   End Date Entry Date/Time (ET)  \
0  Politico/GWU/Battleground 2012-11-04 2012-11-05  2012-11-06 08:40:26   
1           YouGov/Economist 2012-11-03 2012-11-05  2012-11-26 15:31:23   
2           Gravis Marketing 2012-11-03 2012-11-05  2012-11-06 09:22:02   

   Number of Observations     Population             Mode  Obama  Romney  \
0                  1000.0  Likely Voters       Live Phone   47.0    47.0   
1                   740.0  Likely Voters         Internet   49.0    47.0   
2                   872.0  Likely Voters  Automated Phone   48.0    48.0   

   Undecided  Other                                       Pollster URL  \
0        6.0    NaN  http://elections.huffingtonpost.com/pollster/p...   
1        3.0    NaN  http://elections.huffingtonpost.com/pollster/p...   
2        4.0    NaN  http://elections.huffingtonpost.com/pollster/p...   

                                          Source URL     Partisan Affiliation  \
0  http://www.politico.com/news/stories/1112/8338...  Nonpartisan        None   
1  http://cdn.yougov.com/cumulus_uploads/document...  Nonpartisan        None   
2  http://www.gravispolls.com/2012/11/gravis-mark...  Nonpartisan        None   

   Question Text  Question Iteration  
0            NaN                   1  
1            NaN                   1  
2            NaN                   1  
print (election_data.dtypes)
Pollster                          object
Start Date                datetime64[ns]
End Date                  datetime64[ns]
Entry Date/Time (ET)      datetime64[ns]
Number of Observations           float64
Population                        object
Mode                              object
Obama                            float64
Romney                           float64
Undecided                        float64
Other                            float64
Pollster URL                      object
Source URL                        object
Partisan                          object
Affiliation                       object
Question Text                    float64
Question Iteration                 int64
dtype: object


election_data[election_data["Start Date"].dt.year == 2012]

election_data[(election_data["Start Date"].dt.year == 2012) & (election_data["Start Date"].dt.month== 10)]