PR102012 PR102012 - 1 year ago 138
Python Question

using Python list to set conditional pandas statement

I have a calculation that I need to aggregate, but it will not work with GROUPBY in PANDAs. So, I'm stuck with iterating over the groups manually. The groups are defined by 2 fields of 'object' type values, which are essentially the categories.

I think an elegant solution may be to create 2 lists from the unique values in the 2 independent columns with categorical values.
Then create a 'for' loop, and using string values or something, iterate through my PANDAs conditional statement to create a DataFrame; which then eventually does my aggregate calc. This occurs over and over, with only the dataframe with aggregate calculation being kept in memory, with some append of a counter value like '1' to the end of 'df_'. In order to not overwrite each time through the loop. Here is my psuedo code.


for x,y,z in zip(cats1,cats2,lengths):
df = main_df[ (main_df['col2']==x) & (main_df['col3']==y) ]
df['aggcalc'] = df['col1'].agg.mean()
locals()['df_{0}'.format(z)] = df

The last line will hopefully create the persistent dataframe based on the number of combinations of 'cats1' and 'cats2'. ie, "df_1", "df_2", etc... Then the "df" in the 1st 2 lines just gets overwritten each time in the 'for' loop. Is this correct thinking?

Here is a simpler way to look at it.
I want to loop through all possible combinations from 2 independent, varying-length lists. Additionally, I want in each loop to have a 'counter', 'z'. This is the current way to write this and subsequent output:

for x,y in list(itertools.product(cats1,cats2)):
print x,y

blue dog
blue horse
blue cow
blue sheep
yellow dog
yellow horse
yellow cow
yellow sheep
pink dog
pink horse
pink cow
pink sheep

How do I add to this output a 'z' variable which will make the output look like

blue dog 0
blue horse 1
blue cow 2
blue sheep 3
yellow dog 4


Answer Source

The simple answer to your edit is to just use enumerate:

for z, (x, y) in enumerate(itertools.product(cats1, cats2)):
    print x, y, z

blue dog 0
blue horse 1
blue cow 2
blue sheep 3
yellow dog 4
yellow horse 5
yellow cow 6
yellow sheep 7
pink dog 8
pink horse 9
pink cow 10
pink sheep 11

I strongly suspect that you're missing a simpler solution with groupby though, and so I'd recommend posting a new question with dummy data and details of what aggregation you're trying to perform.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download