m3clov3n m3clov3n - 2 months ago 9
Python Question

Sorting and Grouping Nested Lists in Python

I have the following data structure (a list of lists)

[
['4', '21', '1', '14', '2008-10-24 15:42:58'],
['3', '22', '4', '2somename', '2008-10-24 15:22:03'],
['5', '21', '3', '19', '2008-10-24 15:45:45'],
['6', '21', '1', '1somename', '2008-10-24 15:45:49'],
['7', '22', '3', '2somename', '2008-10-24 15:45:51']
]


I would like to be able to


  1. Use a function to reorder the list so that I can group by each item in the list. For example I'd like to be able to group by the second column (so that all the 21's are together)

  2. Use a function to only display certain values from each inner list. For example i'd like to reduce this list to only contain the 4th field value of '2somename'



so the list would look like this

[
['3', '22', '4', '2somename', '2008-10-24 15:22:03'],
['7', '22', '3', '2somename', '2008-10-24 15:45:51']
]

Answer

For the first question, the first thing you should do is sort the list by the second field:

x = [
 ['4', '21', '1', '14', '2008-10-24 15:42:58'], 
 ['3', '22', '4', '2somename', '2008-10-24 15:22:03'], 
 ['5', '21', '3', '19', '2008-10-24 15:45:45'], 
 ['6', '21', '1', '1somename', '2008-10-24 15:45:49'], 
 ['7', '22', '3', '2somename', '2008-10-24 15:45:51']
]

from operator import itemgetter

x.sort(key=itemgetter(1))

Then you can use itertools' groupby function:

from itertools import groupby
y = groupby(x, itemgetter(1))

Now y is an iterator containing tuples of (element, item iterator). It's more confusing to explain these tuples than it is to show code:

for elt, items in groupby(x, itemgetter(1)):
    print(elt, items)
    for i in items:
        print(i)

Which prints:

21 <itertools._grouper object at 0x511a0>
['4', '21', '1', '14', '2008-10-24 15:42:58']
['5', '21', '3', '19', '2008-10-24 15:45:45']
['6', '21', '1', '1somename', '2008-10-24 15:45:49']
22 <itertools._grouper object at 0x51170>
['3', '22', '4', '2somename', '2008-10-24 15:22:03']
['7', '22', '3', '2somename', '2008-10-24 15:45:51']

For the second part, you should use list comprehensions as mentioned already here:

from pprint import pprint as pp
pp([y for y in x if y[3] == '2somename'])

Which prints:

[['3', '22', '4', '2somename', '2008-10-24 15:22:03'],
 ['7', '22', '3', '2somename', '2008-10-24 15:45:51']]