Map Map - 4 months ago 8
Python Question

Python - how to extract the last occurrence meeting a certain condition from a list

For example, I have the following data as a list:

l = [['A', 'aa', '1', '300'],
['A', 'ab', '2', '30'],
['A', 'ac', '3', '60'],
['B', 'ba', '5', '50'],
['B', 'bb', '4', '10'],
['C', 'ca', '6', '50']]


Now for
'A'
,
'B'
, and
'C'
, I wanted to get their last occurrences, i.e.:

[['A', 'ab', '3', '30'],
['B', 'bb', '4', '10'],
['C', 'ca', '6', '50']]


or further, the third column in these occurrences, i.e.:

['3', '4', '6']


Currently, the way I deal with this is:

import pandas as pd
df = pd.DataFrame(l, columns=['u', 'w', 'y', 'z'])
df.set_index('u', inplace=True)
ll = []
for letter in df.index.unique():
ll.append((df.ix[letter, 'y'][-1]))


Then I
%timeit
, it shows:

>> The slowest run took 27.86 times longer than the fastest.
>> This could mean that an intermediate result is being cached.
>> 1000000 loops, best of 3: 887 ns per loop


Just wondering if there is a way to do this using less time than my code? Thanks!

Answer
l =  [['A', 'aa', '1', '300'],
  ['A', 'ab', '2', '30'],
  ['A', 'ac', '3', '60'],
  ['B', 'ba', '5', '50'],
  ['B', 'bb', '4', '10'],
  ['C', 'ca', '6', '50']]

import itertools
for key, group in itertools.groupby(l, lambda x: x[0]):
    print key, list(group)[-1]

With no comment on "efficiency" because you haven't explained your conditions at all. Assuming the list is sorted by first element of sublist in advance.

If the list is sorted, one run through should be enough:

def tidy(l):
    tmp = []
    prev_row = l[0]

    for row in l:
        if row[0] != prev_row[0]:
            tmp.append(prev_row)
        prev_row = row
    tmp.append(prev_row)
    return tmp

and this is ~5x faster than itertools.groupby in a timeit test. Demonstration: https://repl.it/C5Af/0

[Edit: OP has updated their question to say they're already using Pandas to groupby, which is possibly way faster already]

Comments