Mike Woodward Mike Woodward - 4 months ago 41
Python Question

Selecting rows from Pandas Series where rows are arrays

I'm trying to analyze US polling data, specifically, I'm trying to work out which States are safe, marginal, or tight ('closeness'). I have a dataframe with survey results by time and their 'closeness'. I'm using this Pandas statement to get a summary of the 'closeness' entries.

s=self.daily.groupby('State')['closeness'].unique()


This is giving me this series (selection shown for brevity):

State
AK [safe]
AL [safe]
CA [safe]
CO [safe, tight, marginal]
FL [marginal, tight]
IA [safe, tight, marginal]
ID [safe]
IL [safe]
IN [tight, safe]
Name: closeness, dtype: object


The rows are of type array, so, for example,
s[0]
gives:

array(['safe'], dtype=object)


I'm trying to select from this series, but I can't get the syntax right. For example, I'm trying to select just the 'safe' States using this syntax:

ipdb> s[s == 'safe']
*** ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()


this doesn't work either:

test[test == ['safe'])


Here's what I'd like to do: select States that are 'marginal' or 'tight', select States that are 'safe' and only 'safe' and so on. Does anyone have any idea of the syntax I should use, or a better approach in the first place?

============
Here's a sample of the data before the groupby:

ipdb> self.daily.head(3)
Date Democratic share Margin Method Other share \

0 2008-11-04 0.378894 -0.215351 Election 0.026861
1 2008-11-04 0.387404 -0.215765 Election 0.009427
2 2008-11-04 0.388647 -0.198512 Election 0.024194

Republican share State closeness winner
0 0.594245 AK safe Republican
1 0.603169 AL safe Republican

Answer

Say you have a DataFrame with a series of lists, say:

df = pd.DataFrame({'a': [['safe'], ['safe', 'tight'], []]})

Then to see which ones are exactly safe, you can use:

In [7]: df.a.apply(lambda x: x == ['safe'])
Out[7]: 
0     True
1    False
2    False
Name: a, dtype: bool

To find the ones which include safe, you can use:

 In [9]: df.a.apply(lambda x: 'safe' in x)
 Out[9]: 
 0     True
 1     True
 2    False
 Name: a, dtype: bool

and so on.