student student - 23 days ago 9
Python Question

How to flat/reshape a multidimentional dataframe by fixing a unique column?

I am trying to reshape a pandas dataframe. So, consider the following

df
:

0 1 2 3 4 5 6 7 8 unique_id
Apple (Date 09/21/2016) NaN NaN NaN NaN NaN NaN NaN NaN 101
bannana (Date 09/30/2016) NaN NaN NaN NaN NaN NaN NaN NaN 102
cherry (New 04/05/2016) tomatoes (New 04/07/2016) potatoes (Date 11/08/2016) pineapples (Date 04/11/2016) carrots (Date 11/01/2016) NaN NaN NaN NaN 103
coconuts (Date 11/10/2016) NaN NaN NaN NaN NaN NaN NaN NaN 104
oranges (Date 11/01/2016) watermelon (Date 11/07/2016) pears (Date 11/08/2016) carrots (Date 11/01/2016) blackberry (New 01/22/2015) NaN NaN NaN NaN 105
kiwi (Date 11/09/2016) carrots (Date 11/01/2016) NaN NaN NaN NaN NaN NaN NaN 106


How can I set as an unique index the
unique_id
column in the following way?
(*)
:

0 unique_id
Apple (Date 09/21/2016) 101
bannana (Date 09/30/2016) 102
cherry (New 04/05/2016) 103
tomatoes (New 04/07/2016) 103
potatoes (Date 11/08/2016) 103
pineapples (Date 04/11/2016) 103
carrots (Date 11/01/2016) 103
NaN 103
NaN 103
NaN 103
NaN 103
coconuts (Date 11/10/2016) 104
NaN 104
NaN 104
NaN 104
NaN 104
NaN 104
NaN 104
NaN 104
NaN 104
oranges (Date 11/01/2016) 105
watermelon (Date 11/07/2016) 105
pears (Date 11/08/2016) 105
carrots (Date 11/01/2016) 105
blackberry(New 01/22/2015) 105
NaN 105
NaN 105
NaN 105
NaN 105
kiwi (Date 11/09/2016) 106
carrots (Date 11/01/2016) 106
NaN 106
NaN 106
NaN 106
NaN 106
NaN 106
NaN 106
NaN 106


For the above, I tried to use the stack and reset_index() methods in order to set as an index the
unique_id
column, in order to use it as "pivot":

df = pd.DataFrame(df.stack().reset_index(drop=False))
df['col-index'] = df['unique_id']
df = df.set_index('col-index')


However, I guess I am complicating this task, thus how can I reindex the dataframe in order to get
(*)
?

Answer

If dont need rows with NaN values, use stack with set_index and reset_index:

df = df.set_index('unique_id')
       .stack()
       .reset_index(level=1, drop=True)
       .reset_index(name='a')

print (df)
    unique_id                             a
0         101       Apple (Date 09/21/2016)
1         102     bannana (Date 09/30/2016)
2         103       cherry (New 04/05/2016)
3         103     tomatoes (New 04/07/2016)
4         103    potatoes (Date 11/08/2016)
5         103  pineapples (Date 04/11/2016)
6         103     carrots (Date 11/01/2016)
7         104    coconuts (Date 11/10/2016)
8         105     oranges (Date 11/01/2016)
9         105  watermelon (Date 11/07/2016)
10        105       pears (Date 11/08/2016)
11        105     carrots (Date 11/01/2016)
12        105   blackberry (New 01/22/2015)
13        106        kiwi (Date 11/09/2016)
14        106     carrots (Date 11/01/2016)

If need NaN use:

df = df.set_index('unique_id')
       .stack(dropna=False)
       .reset_index(level=1, drop=True)
       .reset_index(name='a')

print (df)
    unique_id                             a
0         101       Apple (Date 09/21/2016)
1         101                           NaN
2         101                           NaN
3         101                           NaN
4         101                           NaN
5         101                           NaN
6         101                           NaN
7         101                           NaN
8         101                           NaN
9         102     bannana (Date 09/30/2016)
10        102                           NaN
11        102                           NaN
12        102                           NaN
13        102                           NaN
14        102                           NaN
15        102                           NaN
16        102                           NaN
17        102                           NaN
18        103       cherry (New 04/05/2016)
19        103     tomatoes (New 04/07/2016)
20        103    potatoes (Date 11/08/2016)
21        103  pineapples (Date 04/11/2016)
22        103     carrots (Date 11/01/2016)
23        103                           NaN
24        103                           NaN
25        103                           NaN
26        103                           NaN
27        104    coconuts (Date 11/10/2016)
28        104                           NaN
29        104                           NaN
30        104                           NaN
31        104                           NaN
32        104                           NaN
33        104                           NaN
34        104                           NaN
35        104                           NaN
36        105     oranges (Date 11/01/2016)
37        105  watermelon (Date 11/07/2016)
38        105       pears (Date 11/08/2016)
39        105     carrots (Date 11/01/2016)
40        105   blackberry (New 01/22/2015)
41        105                           NaN
42        105                           NaN
43        105                           NaN
44        105                           NaN
45        106        kiwi (Date 11/09/2016)
46        106     carrots (Date 11/01/2016)
47        106                           NaN
48        106                           NaN
49        106                           NaN
50        106                           NaN
51        106                           NaN
52        106                           NaN
53        106                           NaN