Zeke Zeke - 6 months ago 119
Python Question

Pandas: Splitting a Row Into Multiple Rows Efficiently

The problem I am currently facing is taking a pandas DataFrame and efficiently taking each record and breaking it down into multiple records in the following way:

Input:

In [16]: pd.DataFrame({'Name': 'Person1', 'State': 'Indiana', 'Money1': 100.42, 'Money2':54.54, 'Money3': 23.45}, index=[1])
Out[16]:
Money1 Money2 Money3 Name State
1 100.42 54.54 23.45 Person1 Indiana


Output:

Money1 Money2 Money3 Name State
1 100.42 np.nan np.nan Person1 Indiana
2 np.nan 54.54 np.nan Person1 Indiana
3 np.nan np.nan 23.45 Person1 Indiana


Essentially, the problem is splitting the original record into x records where x is a list of passed in columns to split by (in this case 'Money1', 'Money2', 'Money3'. I have tried doing this by creating DataFrames and concat'ing them but this is extremely slow and memory inefficient.

EDIT1:

Please not that the answer does not work if even one of your static columns (the ones that are turned into multi-indices) is full of NaN. This is a reported bug in pandas:
https://github.com/pydata/pandas/issues/6322

To get around it, use
fillna
or
replace
to fill in columns comprised entirely of
NaN
with empty string
''
, for example, and then after this process, put the
NaN
back in.

Answer

This should work for a dataframe with an arbitrary number of columns.

df = pd.DataFrame({'Name': ['Person1', 'Person2'], 
                   'State': ['Indiana', 'NY'], 
                   'Money1': [100.42, 200], 
                   'Money2': [54.54, 25], 
                   'Money3': [23.45, 10]})

index_cols = ['Name', 'State']
cols = [c for c in df if c not in index_cols]

df2 = df.set_index(index_cols).stack().reset_index(level=2, drop=True).to_frame('Value')

df2 = pd.concat([pd.Series([v if i % len(cols) == n else np.nan 
                            for i, v in enumerate(df2.Value)], name=col) 
                 for n, col in enumerate(cols)], axis=1).set_index(df2.index)

>>> df2.reset_index()
      Name    State  Money1  Money2  Money3
0  Person1  Indiana       1     NaN     NaN
1  Person1  Indiana     NaN      55     NaN
2  Person1  Indiana     NaN     NaN      23
3  Person2       NY       2     NaN     NaN
4  Person2       NY     NaN      25     NaN
5  Person2       NY     NaN     NaN      10