dartdog dartdog - 6 months ago 117
Python Question

pandas select from Dataframe using startswith

This works (using Pandas 12 dev)

table2=table[table['SUBDIVISION'] =='INVERNESS']

Then I realized I needed to select the field using "starts with" Since I was missing a bunch.
So per the Pandas doc as near as I could follow I tried

criteria = table['SUBDIVISION'].map(lambda x: x.startswith('INVERNESS'))
table2 = table[criteria]

And got AttributeError: 'float' object has no attribute 'startswith'

So I tried an alternate syntax with the same result

table[[x.startswith('INVERNESS') for x in table['SUBDIVISION']]]

Reference http://pandas.pydata.org/pandas-docs/stable/indexing.html#boolean-indexing
Section 4: List comprehensions and map method of Series can also be used to produce more complex criteria:

What am I missing?


You can use the str.startswith DataFrame method to give more consistent results:

In [11]: s = pd.Series(['a', 'ab', 'c', 11, np.nan])

In [12]: s
0      a
1     ab
2      c
3     11
4    NaN
dtype: object

In [13]: s.str.startswith('a', na=False)
0     True
1     True
2    False
3    False
4    False
dtype: bool

and the boolean indexing will work just fine (I prefer to use loc, but it works just the same without):

In [14]: s.loc[s.str.startswith('a', na=False)]
0     a
1    ab
dtype: object


It looks least one of your elements in the Series/column is a float, which doesn't have a startswith method hence the AttributeError, the list comprehension should raise the same error...