Jstuff Jstuff - 4 months ago 10
Python Question

Grouping list by nth element

I have a 2d list like the ones below

original_list = [['2', 'Out', 'Words', 'Test3', '21702-1201', 'US', 41829.0, 'VN', 'Post', 'NAI'],
['Test', 'Info', 'More Info', 'Stuff', '63123-7802', 'US', 40942.0, 'CM', 'User Info', 'VAI'],
['Test1', 'Info1', 'More Info1', 'Stuff1', '63123-7802', 'US', 40942.0, 'CM', 'User Info1', 'VAI'],
['1', 'Information', 'Things', 'NE', '68064-9782', 'US', 40106.0, 'DRH', 'Another User', 'NAI'],]


I have already sorted the list by zip code. Now I want to split the list by the zip code which is the 5th element and group zip codes into new list that are the same. I would also like to sort them by the first 5 numbers of the zip code ignoring the last 4. How can I do this? I tried to use the zip function, but it I could not get it to group together.

Edit:

The desired output would look like this.

new_list1 = ['2', 'Out', 'Words', 'Test3', '21702-1201', 'US', 41829.0, 'VN', 'Post', 'NAI']
new_list2 = ['Test', 'Info', 'More Info', 'Stuff', '63123-7802', 'US', 40942.0, 'CM', 'User Info', 'VAI'],
['Test1', 'Info1', 'More Info1', 'Stuff1', '63123-7802', 'US', 40942.0, 'CM', 'User Info1', 'VAI']
new_list3 = ['1', 'Information', 'Things', 'NE', '68064-9782', 'US', 40106.0, 'DRH', 'Another User', 'NAI']


In regards to the second part of the question. If I had two list that included zips where the first 5 numbers are the same, but the last 4 are different it would group these together. Such as if two of the above zip codes were 63123-7802 and 63123-8956 these would then be grouped together.

Answer

You can use itertools.groupby

>>> from itertools import groupby
>>> l = [['2', 'Out', 'Words', 'Test3', '21702-1201', 'US', 41829.0, 'VN', 'Post', 'NAI'],
...      ['Test', 'Info', 'More Info', 'Stuff', '63123-7802', 'US', 40942.0, 'CM', 'User Info', 'VAI'],
...      ['Test1', 'Info1', 'More Info1', 'Stuff1', '63123-7802', 'US', 40942.0, 'CM', 'User Info1', 'VAI'],
...      ['1', 'Information', 'Things', 'NE', '68064-9782', 'US', 40106.0, 'DRH', 'Another User', 'NAI'],]
>>> zip_retriever = lambda sub_l: sub_l[4].split('-')[0] # Grab the part leading up to '-' in the zip code
>>> for zip_code, vals in groupby(l, zip_retriever):
...     print zip_code, list(vals)
...     
21702 [['2', 'Out', 'Words', 'Test3', '21702-1201', 'US', 41829.0, 'VN', 'Post', 'NAI']]
63123 [['Test', 'Info', 'More Info', 'Stuff', '63123-7802', 'US', 40942.0, 'CM', 'User Info', 'VAI'], ['Test1', 'Info1', 'More Info1', 'Stuff1', '63123-7802', 'US', 40942.0, 'CM', 'User Info1', 'VAI']]
68064 [['1', 'Information', 'Things', 'NE', '68064-9782', 'US', 40106.0, 'DRH', 'Another User', 'NAI']]
>>> 
Comments