Jack Florey Jack Florey - 7 months ago 20
Python Question

Pandas: "distribute" column values into multiple rows

I am a pandas newbie, and I am trying to solve the following problem.

I have a large dataframe (10000 x 28) with structure as follows.

Col1 Col2 Col3 Col4 Col5
A B C D E


How can I reshape it to look like the following?

Col1 Col2 NewColName
A B C
A B D
A B E


The first two columns remain the same, and the last three columns get "distributed" over them.

My goal is to take a large data frame and turn it into "tidy data"

I have tried methods like concat, stack etc. but I feel I'm missing some elegant way to solve this.

Any help is greatly appreciated. Many thanks!

Answer

Set the ['Col1, 'Col2] as index and use .stack().

df.set_index(['Col1', 'Col2']).stack()

Col1  Col2   
A     B     0    C
            0    D
            0    E

Then do .reset_index() to format as in your example (you can also add name='Col' for the same result as suggested by @jezrael:

df.reset_index(-1, drop=True).reset_index(name='Col')

  Col1 Col2  0
0    A    B  C
1    A    B  D
2    A    B  E