swepab swepab - 4 months ago 15
Python Question

Using pandas group-by to select specific sub-groups

I have a pandas dataframe on the following form:



id grp values1 values2
0 1 a_1 2 4
1 1 a_2 3 6
2 1 b_1 4 8
3 2 b_2 5 10
4 2 c_1 6 12
5 3 z_1 7 14
6 4 y_1 8 16
7 5 a_1 9 18
8 5 a_2 10 20
9 5 a_1 11 22
10 5 a_2 12 24
11 7 z_1 13 26
12 8 q_1 14 28
13 9 a_2 15 30
14 9 a_1 16 32



What I wand is a reduced data frame with the rows and ids containing "a_1" and "a_2" only for the ID-group like below.


id grp values1 values2

7 5 a_1 9 18
8 5 a_2 10 20
9 5 a_1 11 22
10 5 a_2 12 24
13 9 a_2 15 30
14 9 a_1 16 32



The end data frame should have even numbers when doing a count on the id variable since the group values goes in "pairs", i.e. "a_1" and "a_2" together. Also, this should work for an arbitrary number of "group by columns" like the id column is.

The code for the dataframe is posted below:

df = pd.DataFrame({'id' : [1,1,1,2,2,3,4,5,5,5,5,7,8,9,9]
,'grp' : ['a_1', 'a_2', 'b_1', 'b_2', 'c_1', 'z_1', 'y_1', 'a_1', 'a_2', 'a_1', 'a_2', 'z_1', 'q_1', 'a_2','a_1']
,'values1' : [i for i in range(2,17,1)]
,'values2' : [i for i in range(4,33,2)]
})


Thanks for considerations, time and input!

Answer

Or you could take the easier way and:

 filtered_df  = df.ix[(df['grp'] == 'a_1') | (df['grp'] == 'a_2')]