NRKirby NRKirby - 5 months ago 9
Python Question

How do I create a DataFrame containing a column where rows are greater than a number?

I have a DataFrame with these columns:

observationID int64
recordKey int64
gridReference object
siteKey float64
siteName float64
featureKey int64
startDate object
endDate object
pTaxonVersionKey object
taxonName object
authority object
commonName object
ints int32


I want to create a new DataFrame containing columns
commonName
and
ints
where
ints
is greater than 10, I am doing:

df_greater_10 = df[['commonName', df[df.ints >= 1997]]]


I see the problem lies with the expression
df[df.ints >= 1997]
as I'm returning a DataFrame - how can I just get the column of
ints
with values greater than 10?

Answer

You can use one of many available indexers. I would recommend .ix, because it seems to be faster:

df_greater_10 = df.ix[df.ints >= 1997, ['commonName', 'ints']]

or if you need only ints column

df_greater_10 = df.ix[df.ints >= 1997, 'ints']

Demo:

In [123]: df = pd.DataFrame(np.random.randint(5, 15, (10, 3)), columns=list('abc'))

In [124]: df
Out[124]:
    a   b   c
0  13  11  14
1  14  10  13
2   7  11   6
3   7  13  12
4   9   9   6
5   7   7   7
6   5   7   8
7   5  11   5
8   9   7   9
9  11  13   7

In [125]: df_greater_10 = df.ix[df.c > 10, ['a','c']]

In [126]: df_greater_10
Out[126]:
    a   c
0  13  14
1  14  13
3   7  12