Zeke Zeke - 1 year ago 292
Python Question

Pandas: Splitting a Row Into Multiple Rows Efficiently

The problem I am currently facing is taking a pandas DataFrame and efficiently taking each record and breaking it down into multiple records in the following way:


In [16]: pd.DataFrame({'Name': 'Person1', 'State': 'Indiana', 'Money1': 100.42, 'Money2':54.54, 'Money3': 23.45}, index=[1])
Money1 Money2 Money3 Name State
1 100.42 54.54 23.45 Person1 Indiana


Money1 Money2 Money3 Name State
1 100.42 np.nan np.nan Person1 Indiana
2 np.nan 54.54 np.nan Person1 Indiana
3 np.nan np.nan 23.45 Person1 Indiana

Essentially, the problem is splitting the original record into x records where x is a list of passed in columns to split by (in this case 'Money1', 'Money2', 'Money3'. I have tried doing this by creating DataFrames and concat'ing them but this is extremely slow and memory inefficient.


Please not that the answer does not work if even one of your static columns (the ones that are turned into multi-indices) is full of NaN. This is a reported bug in pandas:

To get around it, use
to fill in columns comprised entirely of
with empty string
, for example, and then after this process, put the
back in.

Answer Source

This should work for a dataframe with an arbitrary number of columns.

df = pd.DataFrame({'Name': ['Person1', 'Person2'], 
                   'State': ['Indiana', 'NY'], 
                   'Money1': [100.42, 200], 
                   'Money2': [54.54, 25], 
                   'Money3': [23.45, 10]})

index_cols = ['Name', 'State']
cols = [c for c in df if c not in index_cols]

df2 = df.set_index(index_cols).stack().reset_index(level=2, drop=True).to_frame('Value')

df2 = pd.concat([pd.Series([v if i % len(cols) == n else np.nan 
                            for i, v in enumerate(df2.Value)], name=col) 
                 for n, col in enumerate(cols)], axis=1).set_index(df2.index)

>>> df2.reset_index()
      Name    State  Money1  Money2  Money3
0  Person1  Indiana       1     NaN     NaN
1  Person1  Indiana     NaN      55     NaN
2  Person1  Indiana     NaN     NaN      23
3  Person2       NY       2     NaN     NaN
4  Person2       NY     NaN      25     NaN
5  Person2       NY     NaN     NaN      10