student student - 27 days ago 7
Python Question

Which is the most efficient way of flattening down a pandas dataframe?

I have a large pandas dataframe with 8 columns and several

NaN
values:

0 1 2 3 4 5 6 7 8
1 Google, Inc. (Date 11/07/2016) NaN NaN NaN NaN NaN NaN NaN NaN
2 Apple Inc. (Date 07/01/2016) Amazon (Date 11/01/2016) NaN NaN NaN NaN NaN NaN NaN
3 IBM, Inc. (Date 11/08/2016) NaN NaN NaN NaN NaN NaN NaN NaN
4 Microsoft (Date 11/10/2016) Google, Inc. (Date 11/10/1990) Google, Inc. (Date 11/07/2016) Samsung (Date 05/02/2016) NaN NaN NaN NaN NaN


How can I flatten down it like this:

0 companies
1 Google, Inc. (Date 11/07/2016)
2 Apple Inc. (Date 07/01/2016)
3 Amazon (Date 11/01/2016)
4 IBM, Inc. (Date 11/08/2016)
5 Microsoft (Date 11/10/2016)
6 Google, Inc. (Date 11/10/1990)
7 Google, Inc. (Date 11/07/2016)
8 Samsung (Date 05/02/2016)


I read the docs and tried:

df.iloc[:,0]


The problem is that I lost information and order over the other columns. I idea of how to flat without lost data in the other cells and order?.

Answer

You can stack the columns and optionally reset the index. By default, stack drops NaN's.

df.stack()
Out: 
0  0    Google, Inc. (Date 11/07/2016) 
1  0      Apple Inc. (Date 07/01/2016) 
   1          Amazon (Date 11/01/2016) 
2  0       IBM, Inc. (Date 11/08/2016) 
3  0       Microsoft (Date 11/10/2016) 
   1    Google, Inc. (Date 11/10/1990) 
   2    Google, Inc. (Date 11/07/2016) 
   3         Samsung (Date 05/02/2016) 
dtype: object

df.stack().reset_index(drop=True)
Out: 
0    Google, Inc. (Date 11/07/2016) 
1      Apple Inc. (Date 07/01/2016) 
2          Amazon (Date 11/01/2016) 
3       IBM, Inc. (Date 11/08/2016) 
4       Microsoft (Date 11/10/2016) 
5    Google, Inc. (Date 11/10/1990) 
6    Google, Inc. (Date 11/07/2016) 
7         Samsung (Date 05/02/2016) 
dtype: object