Jason Boyd Jason Boyd - 3 months ago 11x
Python Question

How do I filter rows in a dataframe that have whole numbers in one column

I am new to Python and Pandas and I am struggling a bit with this.

I have a set of data with an

column of type
. Some of the values have a fractional part and some do not. I want to remove all the rows that have whole number values for

This was my attempt at it:

estimatedAges = train[int(train['Age']) < train['Age']]

But I got this error:

TypeError Traceback (most recent call last)
in ()
1 #estimatedAges = train[train['Age'] > 1]
----> 2 estimatedAges = train[int(train['Age']) < train['Age']]
3 estimatedAges.info()

C:\Anaconda3\lib\site-packages\pandas\core\series.py in wrapper(self)
76 return converter(self.iloc[0])
77 raise TypeError("cannot convert the series to "
---> 78 "{0}".format(str(converter)))
80 return wrapper

TypeError: cannot convert the series to <class 'int'`>

So, it looks like
does not work on series data and I am going to have to find another approach, I'm just not sure what that other approach is.


I think you can use astype for cast to int:

estimatedAges = train[train['Age'].astype(int) < train['Age']]


train = pd.DataFrame({'Age':[1,2,3.4]})
print (train)
0  1.0
1  2.0
2  3.4

print (train[train['Age'].astype(int) < train['Age']])
2  3.4


train = pd.DataFrame({'Age':[1,2,3.4]})
train = pd.concat([train]*10000).reset_index(drop=True)

In [62]: %timeit (train[train['Age'].astype(int) < train['Age']])
The slowest run took 6.59 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 544 ┬Ás per loop

In [63]: %timeit (train[train['Age'].apply(int) < train['Age']])
100 loops, best of 3: 11.1 ms per loop

In [64]: %timeit (train[train.Age > train.Age.round(0)])
1000 loops, best of 3: 1.55 ms per loop

EDIT by comment of ajcr, thank you:

If values are negative and positive float, use:

train = pd.DataFrame({'Age':[1,-2.8,3.9]})
print (train)
0  1.0
1 -2.8
2  3.9

print (train[train['Age'].astype(int) != train['Age']])
1 -2.8
2  3.9