Valerio D. Ciotti Valerio D. Ciotti - 1 month ago 8
Python Question

Filter rows in a pandas DataFrame based on a value

I have DataFrame similar to the below (this is just a sample):

i TIME CITIES_LABEL Value lat_rounded long

2 2005 Tilburg 22 250 52.070498 4.300700
3 2005 Amsterdam 45 825 52.370216 4.895168
4 2005 Rotterdam 27 600 51.924420 4.477733
5 2005 Utrecht 12 915 52.090737 5.121420
6 2005 Eindhoven 9 165 51.441642 5.469722
7 2006 Tilburg 7 800 51.560596 5.091914
8 2005 Groningen 7 620 53.219383 6.566502
9 2005 Enschede 6 250 52.221537 6.893662
10 2005 Arnhem 6 025 51.985103 5.898730
11 2006 Utrecht 3 400 50.888174 5.979499
12 2006 Amsterdam 6 795 52.350785 5.264702
13 2005 Breda 8 565 51.571915 4.768323
14 2010 Groningen 6 325 51.812563 5.837226
15 2005 Apeldoorn 7 005 52.211157 5.969923
16 2007 Utrecht 3 785 53.201233 5.799913
17 2006 Rotterdam 7 130 52.387388 4.646219
18 2005 Zaanstad 6 060 52.457966 4.751042
19 2008 Tilburg 6 945 51.697816 5.303675
20 2007 Amsterdam 5 840 52.156111 5.387827
21 2005 Maastricht 5 220 50.851368 5.690972


Cities are repeated along the
CITIES_LABEL
field. I would like to filter the cities based on their highest
TIME
value. An example of the output I would like is:

i TIME CITIES_LABEL Value lat_rounded long

6 2005 Eindhoven 9 165 51.441642 5.469722
9 2005 Enschede 6 250 52.221537 6.893662
10 2005 Arnhem 6 025 51.985103 5.898730
13 2005 Breda 8 565 51.571915 4.768323
14 2010 Groningen 6 325 51.812563 5.837226
15 2005 Apeldoorn 7 005 52.211157 5.969923
16 2007 Utrecht 3 785 53.201233 5.799913
17 2006 Rotterdam 7 130 52.387388 4.646219
18 2005 Zaanstad 6 060 52.457966 4.751042
19 2008 Tilburg 6 945 51.697816 5.303675
20 2007 Amsterdam 5 840 52.156111 5.387827
21 2005 Maastricht 5 220 50.851368 5.690972


Any thoughts on how best to approach this issue in pandas?

EDIT

my question is different from Python : How can I get Rows which have the max value of the group to which they belong? because I am looking for a filter for both
TIME
and
CITIES_LABEL
while the previous question is only looking at filtering based to a (maximum) value of one field, and it does not care for duplicates in other fields

Answer

use groupby and idxmax

df.ix[df.groupby('CITIES_LABEL').TIME.idxmax()]

enter image description here

Comments