PR102012 - 7 months ago 38

Python Question

I have a calculation that I need to aggregate, but it will not work with GROUPBY in PANDAs. So, I'm stuck with iterating over the groups manually. The groups are defined by 2 fields of 'object' type values, which are essentially the categories.

I think an elegant solution may be to create 2 lists from the unique values in the 2 independent columns with categorical values.

Then create a 'for' loop, and using string values or something, iterate through my PANDAs conditional statement to create a DataFrame; which then eventually does my aggregate calc. This occurs over and over, with only the dataframe with aggregate calculation being kept in memory, with some append of a counter value like '1' to the end of 'df_'. In order to not overwrite each time through the loop. Here is my psuedo code.

`cats1=['blue','yellow','pink']`

cats2=['dog','horse','cow','sheep']

lengths=list(itertools.product(cats1,cats2))

for x,y,z in zip(cats1,cats2,lengths):

df = main_df[ (main_df['col2']==x) & (main_df['col3']==y) ]

df['aggcalc'] = df['col1'].agg.mean()

locals()['df_{0}'.format(z)] = df

The last line will hopefully create the persistent dataframe based on the number of combinations of 'cats1' and 'cats2'. ie, "df_1", "df_2", etc... Then the "df" in the 1st 2 lines just gets overwritten each time in the 'for' loop. Is this correct thinking?

EDIT..............

Here is a simpler way to look at it.

I want to loop through all possible combinations from 2 independent, varying-length lists. Additionally, I want in each loop to have a 'counter', 'z'. This is the current way to write this and subsequent output:

`for x,y in list(itertools.product(cats1,cats2)):`

print x,y

blue dog

blue horse

blue cow

blue sheep

yellow dog

yellow horse

yellow cow

yellow sheep

pink dog

pink horse

pink cow

pink sheep

How do I add to this output a 'z' variable which will make the output look like

`blue dog 0`

blue horse 1

blue cow 2

blue sheep 3

yellow dog 4

...etc

Answer

The simple answer to your edit is to just use `enumerate`

:

```
for z, (x, y) in enumerate(itertools.product(cats1, cats2)):
print x, y, z
blue dog 0
blue horse 1
blue cow 2
blue sheep 3
yellow dog 4
yellow horse 5
yellow cow 6
yellow sheep 7
pink dog 8
pink horse 9
pink cow 10
pink sheep 11
```

I strongly suspect that you're missing a simpler solution with `groupby`

though, and so I'd recommend posting a new question with dummy data and details of what aggregation you're trying to perform.