ch36r5s ch36r5s - 2 months ago 12
Python Question

python & pandas - How to calculate frequency under conditions in columns in DataFrame?

I have a series of data in a DataFrame called

frames
:

NoUsager Sens IdVehiculeUtilise NoConducteur NoAdresse Fait NoDemande Periods
0 000001 + 287Véh 000087 000079 1 42196000013 Matin
1 000001 - 287Véh 000087 000079 1 42196000013 Matin
2 000314 + 263Véh 000077 006470 1 42196000002 Matin
3 002372 + 287Véh 000087 002932 1 42196000016 Matin
4 000466 + 287Véh 000087 002932 1 42196000015 Matin
5 000314 - 263Véh 000077 000456 1 42196000002 Matin
6 000466 - 287Véh 000087 004900 1 42196000015 Matin
7 002372 - 287Véh 000087 007072 1 42196000016 Matin
8 002641 + 263Véh 000077 007225 1 42196000004 Soir
9 002641 - 263Véh 000077 000889 1 42196000004 Soir
10 000382 + 263Véh 000077 002095 1 42196000006 Soir
11 002641 + 287Véh 000087 000889 1 42196000019 Soir
12 000382 - 263Véh 000077 006168 1 42196000006 Soir
13 002641 - 287Véh 000087 007225 1 42196000019 Soir
14 001611 + 287Véh 000087 004236 -1 42196000021 Soir
15 002785 + 263Véh 000077 007482 1 42196000007 Soir
16 002372 + 287Véh 000087 007072 1 42196000022 Soir
17 002785 - 263Véh 000077 007483 1 42196000007 Soir
18 000466 + 287Véh 000087 004900 1 42196000023 Soir
19 000382 + 263Véh 000077 006168 1 42196000008 Soir


For each
Usager
, depending on
Sens
and
Periods
, they can have more than one related address. I want to know for all the
Usager
, how many
address
do they have and the frequency of each address.
I used
frames.set_index(['NoUsager','NoAdresse'])
to make it looks like:




EDIT

New pic

I don't want all the other columns but only a new one with the result of frequency. In which way I can do it? Can I use
pivot()
to do it?

Any help will be really appreciated!

Answer

I think you need groupby by columns which will be indexes (NoUsager,Sens,Periods) in output df. Then need add column (NoAdresse) as last item in list in groupby, which is converted by unstack to columns in output. And you need aggregate by size.

df = df.groupby(['NoUsager','Sens','Periods', 'NoAdresse']).size().unstack(fill_value=0)
print (df)
NoAdresse              79    456   889   2095  2932  4236  4900  6168  6470  \
NoUsager Sens Periods                                                         
1        +    Matin       1     0     0     0     0     0     0     0     0   
         -    Matin       1     0     0     0     0     0     0     0     0   
314      +    Matin       0     0     0     0     0     0     0     0     1   
         -    Matin       0     1     0     0     0     0     0     0     0   
382      +    Soir        0     0     0     1     0     0     0     1     0   
         -    Soir        0     0     0     0     0     0     0     1     0   
466      +    Matin       0     0     0     0     1     0     0     0     0   
              Soir        0     0     0     0     0     0     1     0     0   
         -    Matin       0     0     0     0     0     0     1     0     0   
1611     +    Soir        0     0     0     0     0     1     0     0     0   
2372     +    Matin       0     0     0     0     1     0     0     0     0   
              Soir        0     0     0     0     0     0     0     0     0   
         -    Matin       0     0     0     0     0     0     0     0     0   
2641     +    Soir        0     0     1     0     0     0     0     0     0   
         -    Soir        0     0     1     0     0     0     0     0     0   
2785     +    Soir        0     0     0     0     0     0     0     0     0   
         -    Soir        0     0     0     0     0     0     0     0     0   

NoAdresse              7072  7225  7482  7483  
NoUsager Sens Periods                          
1        +    Matin       0     0     0     0  
         -    Matin       0     0     0     0  
314      +    Matin       0     0     0     0  
         -    Matin       0     0     0     0  
382      +    Soir        0     0     0     0  
         -    Soir        0     0     0     0  
466      +    Matin       0     0     0     0  
              Soir        0     0     0     0  
         -    Matin       0     0     0     0  
1611     +    Soir        0     0     0     0  
2372     +    Matin       0     0     0     0  
              Soir        1     0     0     0  
         -    Matin       1     0     0     0  
2641     +    Soir        0     1     0     0  
         -    Soir        0     1     0     0  
2785     +    Soir        0     0     1     0  
         -    Soir        0     0     0     1  

If need reset index:

df = df.groupby(['NoUsager','Sens','Periods', 'NoAdresse'])
       .size()
       .unstack(fill_value=0)
       .reset_index()
       .rename_axis(None, axis=1)

print (df)
    NoUsager Sens Periods  79  456  889  2095  2932  4236  4900  6168  6470  \
0          1    +   Matin   1    0    0     0     0     0     0     0     0   
1          1    -   Matin   1    0    0     0     0     0     0     0     0   
2        314    +   Matin   0    0    0     0     0     0     0     0     1   
3        314    -   Matin   0    1    0     0     0     0     0     0     0   
4        382    +    Soir   0    0    0     1     0     0     0     1     0   
5        382    -    Soir   0    0    0     0     0     0     0     1     0   
6        466    +   Matin   0    0    0     0     1     0     0     0     0   
7        466    +    Soir   0    0    0     0     0     0     1     0     0   
8        466    -   Matin   0    0    0     0     0     0     1     0     0   
9       1611    +    Soir   0    0    0     0     0     1     0     0     0   
10      2372    +   Matin   0    0    0     0     1     0     0     0     0   
11      2372    +    Soir   0    0    0     0     0     0     0     0     0   
12      2372    -   Matin   0    0    0     0     0     0     0     0     0   
13      2641    +    Soir   0    0    1     0     0     0     0     0     0   
14      2641    -    Soir   0    0    1     0     0     0     0     0     0   
15      2785    +    Soir   0    0    0     0     0     0     0     0     0   
16      2785    -    Soir   0    0    0     0     0     0     0     0     0   

    7072  7225  7482  7483  
0      0     0     0     0  
1      0     0     0     0  
2      0     0     0     0  
3      0     0     0     0  
4      0     0     0     0  
5      0     0     0     0  
6      0     0     0     0  
7      0     0     0     0  
8      0     0     0     0  
9      0     0     0     0  
10     0     0     0     0  
11     1     0     0     0  
12     1     0     0     0  
13     0     1     0     0  
14     0     1     0     0  
15     0     0     1     0  
16     0     0     0     1  

Another solution with crosstab:

df = pd.crosstab([df.NoUsager,df.Sens,df.Periods], df.NoAdresse)
       .reset_index()
       .rename_axis(None, axis=1)

print (df)
    NoUsager Sens Periods  79  456  889  2095  2932  4236  4900  6168  6470  \
0          1    +   Matin   1    0    0     0     0     0     0     0     0   
1          1    -   Matin   1    0    0     0     0     0     0     0     0   
2        314    +   Matin   0    0    0     0     0     0     0     0     1   
3        314    -   Matin   0    1    0     0     0     0     0     0     0   
4        382    +    Soir   0    0    0     1     0     0     0     1     0   
5        382    -    Soir   0    0    0     0     0     0     0     1     0   
6        466    +   Matin   0    0    0     0     1     0     0     0     0   
7        466    +    Soir   0    0    0     0     0     0     1     0     0   
8        466    -   Matin   0    0    0     0     0     0     1     0     0   
9       1611    +    Soir   0    0    0     0     0     1     0     0     0   
10      2372    +   Matin   0    0    0     0     1     0     0     0     0   
11      2372    +    Soir   0    0    0     0     0     0     0     0     0   
12      2372    -   Matin   0    0    0     0     0     0     0     0     0   
13      2641    +    Soir   0    0    1     0     0     0     0     0     0   
14      2641    -    Soir   0    0    1     0     0     0     0     0     0   
15      2785    +    Soir   0    0    0     0     0     0     0     0     0   
16      2785    -    Soir   0    0    0     0     0     0     0     0     0   

    7072  7225  7482  7483  
0      0     0     0     0  
1      0     0     0     0  
2      0     0     0     0  
3      0     0     0     0  
4      0     0     0     0  
5      0     0     0     0  
6      0     0     0     0  
7      0     0     0     0  
8      0     0     0     0  
9      0     0     0     0  
10     0     0     0     0  
11     1     0     0     0  
12     1     0     0     0  
13     0     1     0     0  
14     0     1     0     0  
15     0     0     1     0  
16     0     0     0     1  

EDIT by comment:

I think you need only aggregate size:

df = df.groupby(['NoUsager','NoAdresse']).size().reset_index(name='Count')
print (df)
    NoUsager  NoAdresse  Count
0          1         79      2
1        314        456      1
2        314       6470      1
3        382       2095      1
4        382       6168      2
5        466       2932      1
6        466       4900      2
7       1611       4236      1
8       2372       2932      1
9       2372       7072      2
10      2641        889      2
11      2641       7225      2
12      2785       7482      1
13      2785       7483      1

If need set indexes, you can use another solution - rename Series name and then call to_frame:

df = df.groupby(['NoUsager','NoAdresse']).size().rename('Count').to_frame()
                    Count
NoUsager NoAdresse       
1        79             2
314      456            1
         6470           1
382      2095           1
         6168           2
466      2932           1
         4900           2
1611     4236           1
2372     2932           1
         7072           2
2641     889            2
         7225           2
2785     7482           1
         7483           1

Or add set_index:

df = df.groupby(['NoUsager','NoAdresse'])
       .size()
       .reset_index(name='Count')
       .set_index(['NoUsager','NoAdresse'])
print (df)
                    Count
NoUsager NoAdresse       
1        79             2
314      456            1
         6470           1
382      2095           1
         6168           2
466      2932           1
         4900           2
1611     4236           1
2372     2932           1
         7072           2
2641     889            2
         7225           2
2785     7482           1
         7483           1
Comments