Michael Currie Michael Currie - 7 months ago 49
Python Question

Pandas WHERE clause for string index?

Say I have a table,

ss_prices
, which has a primary key column called
fund_code
that
pandas
treats as the index:

>>> arr = list(zip(['MM1', 'MM2', '3MM', '4AA'], range(1,5)))
>>> cols = ['fund_code', 'values']
>>> ss_prices = pd.DataFrame(arr, columns=cols).set_index('fund_code')
>>> ss_prices
values
fund_code
MM1 1
MM2 2
3MM 3
4AA 4


I want to get only those rows where the primary key starts with 'MM'. In SQL I can do:

select * from ss_prices
where left(fund_code, 2) = 'MM'


But in
pandas
it seems I have to do:

ss_prices[np.vectorize(lambda x: x[:2] == 'MM')(ss_prices.index.values)]


The
pandas
syntax is certainly more confusing and less readable. Short of using a tool like pandasql, is there a more readable way of accomplishing a
WHERE
clause?

Answer

You can use DataFrame.filter to filter on the index using a regular expression:

In [5]: ss_prices
Out[5]:
     values
MM1       1
MM2       2
3MM       3
4AA       4

In [6]: ss_prices.filter(regex='^MM', axis=0)
Out[6]:
     values
MM1       1
MM2       2