SpicyClubSauce SpicyClubSauce - 2 months ago 41
Python Question

How to explode a list inside a Dataframe cell into separate rows

I'm looking to turn a pandas cell containing a list into rows for each of those values.

So, take this:

enter image description here

If I'd like to unpack and stack the values in the 'nearest_neighbors" column so that each value would be a row within each 'opponent' index, how would I best go about this? Are there pandas methods that are meant for operations like this? I'm just not aware.

Thanks in advance, guys.

Answer

You can do this using a list comprehension over the rows in your dataframe. The basic form is

df_exploded = pd.DataFrame([{'common_col1': row.common_col1, 
                             'common_col2': row.common_col2, 
                           # ... all the other columns you want to keep
                             'exploded_col': e} 
                            for _,row in df.iterrows() for e in row.explode_col])

Alternatively to avoid having to list the columns, you could just explode the column you want, and then join it back to your original dataframe:

exploded = pd.DataFrame([(i, e) for i,row in df.iterrows() 
                         for e in row.explode_col]).set_index(0)[1]

df.join(exploded)

Here's how it would work for your example. In the code below, I first reset the index to make the row iteration easier.

I create a list of lists where each element of the outer list is a row of the target DataFrame and each element of the inner list is one of the columns. This nested list will ultimately be concatenated to create the desired DataFrame.

I use a list comprehension to create a row for each element of the nearest_neighbors paired with the relevant name and opponent.

Finally, I create a new DataFrame from this list (using the original column names and setting the index back to name and opponent).

df = (pd.DataFrame({'name': ['A.J. Price'] * 3, 
                    'opponent': ['76ers', 'blazers', 'bobcats'], 
                    'nearest_neighbors': [['Zach LaVine', 'Jeremy Lin', 'Nate Robinson', 'Isaia']] * 3})
      .set_index(['name', 'opponent']))

>>> df
                                                    nearest_neighbors
name       opponent                                                  
A.J. Price 76ers     [Zach LaVine, Jeremy Lin, Nate Robinson, Isaia]
           blazers   [Zach LaVine, Jeremy Lin, Nate Robinson, Isaia]
           bobcats   [Zach LaVine, Jeremy Lin, Nate Robinson, Isaia]

df.reset_index(inplace=True)

# This is the key step here
rows = [[row.name, row.opponent, nn] 
        for _, row in df.iterrows() for nn in row.nearest_neighbors]

df_new = pd.DataFrame(rows, columns=df.columns).set_index(['name', 'opponent'])

>>> df_new
                    nearest_neighbors
name       opponent                  
A.J. Price 76ers          Zach LaVine
           76ers           Jeremy Lin
           76ers        Nate Robinson
           76ers                Isaia
           blazers        Zach LaVine
           blazers         Jeremy Lin
           blazers      Nate Robinson
           blazers              Isaia
           bobcats        Zach LaVine
           bobcats         Jeremy Lin
           bobcats      Nate Robinson
           bobcats              Isaia