pasitrauk pasitrauk - 12 days ago 8
Python Question

How to get a cross tabulation with pandas crosstab that would display the frequency of multiple values of a column variable?

Let's say i have a dataframe:

df = pd.DataFrame(np.random.randint(0,5, size=(5,6)), columns=list('ABCDEF'))


Crossing variables with
pd.crosstab
is simple enough:

table = pd.crosstab(index=df['A'], columns=df['B'])


Yields:

B 1 2 3 4
A
0 1 0 0 0
1 0 0 0 1
2 0 1 1 0
3 0 1 0 0


Where I would for example want a table like this:

B (1+2+3) 1 2 3 4
A
0 1 1 0 0 0
1 0 0 0 0 1
2 2 0 1 1 0
3 1 0 1 0 0


Can anyone set me on the right track here?

Answer

Use sum with subset, but if use small random df there can be problem you get always another values so values of columns will be different. If use np.random.seed(100) get same test output as my answer.

table['(1+2+3)'] = table[[1,2,3]].sum(axis=1)

Sample:

np.random.seed(100)
df = pd.DataFrame(np.random.randint(0,5, size=(5,6)), columns=list('ABCDEF'))
table = pd.crosstab(index=df['A'], columns=df['B'])
table['(1+2+3)'] = table[[1,2,3]].sum(axis=1)
print (table)
B  0  1  2  3  4  (1+2+3)
A                        
0  1  0  0  0  1        0
1  0  0  0  1  0        1
2  0  0  1  0  0        1
3  0  1  0  0  0        1
Comments