Madmartigan Madmartigan - 6 months ago 31
Python Question

python groupby itertools list methods

I have a list like this:
#[YEAR, DAY, VALUE1, VALUE2, VALUE3]

[[2014, 1, 10, 20, 30],
[2014, 1, 3, 7, 4],
[2014, 2, 14, 43,5],
[2014, 2, 33, 1, 6]
...
[2013, 1, 34, 54, 3],
[2013, 2, 23, 33, 2],
...]


and I need to group by years and days, to obtain something like:

[[2014, 1, sum[all values1 with day=1), sum(all values2 with day =1), avg(all values3 with day=1)],
[2014, 2, sum[all values1 with day=2), sum(all values2 with day =2), avg(all values3 with day=2)],
....
[2013, 1, sum[all values1 with day=1), sum(all values2 with day =1), avg(all values3 with day=1)],
[2013, 2, sum[all values1 with day=2), sum(all values2 with day =2), avg(all values3 with day=2)],,
....]


How can I do that with itertool?, I can't use pandas or numpy because my system doesn't support it. Thanks a lot for your help.

Answer
import itertools
import operator

key = operator.itemgetter(0,1)
my_list.sort(key=key)
for (year, day), records in itertools.groupby(my_list, key):
    print("Records on", year, day, ":")
    for record in records: print(record)

itertools.groupby doesn't work like SQL's GROUPBY. It groups in-order. This means that if you have a list of elements that are not sorted, you may get multiple groups on the same key. So, let's say you want to group a list of integers based on their parity (even vs odd), then you might do this:

L = [1,2,3,4,5,7,8]  # notice that there's no 6 in the list
itertools.groupby(L, lambda i:i%2)

Now, if you come from an SQL world, you might think that this gives you two groups - one for the even numbers, and one for the odd numbers. While this makes sense, it is not how Python does things. It considers each element in turn and checks if it belongs to the same group as the previous element. If so, both elements are added to the group; else, each element gets its own group.

So with the above list, we get:

key: 1
elements: [1]

key: 0
elements[2]

key: 1
elements: [3]

key: 0
elements[4]

key: 1
elements: [5,7]  # see what happened here?

So if you're looking to make a grouping like in SQL, then you'll want to sort the list before hand, by the key (criteria) with which you want to group:

L = [1,2,3,4,5,7,8]  # notice that there's no 6 in the list
L.sort(key=lambda i:i%2)  # now L looks like this: [2,4,1,3,5,7] - the odds and the evens stick together
itertools.groupby(L, lambda i:%2)  # this gives two groups containing all the elements that belong to each group
Comments