Kalimantan Kalimantan - 2 months ago 6
Python Question

row to columns while keeping part of dataframe, display on same row

I am trying to move some of my rows and make the them columns, but keep a large portion of the dataframe the same.

Resulting Dataframe:

ID Thing Level1 Level2 Time OAttribute IsTrue Score Value
1 bicycle value value 9:30 whatever yes 1 type1
1 bicycle value value 9:30 whatever yes 2 type2
2 bicycle value value 2:30 whatever no
4 non-bic value value 3:30 whatever no 4 type3
1 bicycle value value 9:30 whatever yes 3 type3


and I want something like this:

ID Thing Level1 Level2 Time OAttribute IsTrue Type1 Type2 Type3
1 bicycle value value 9:30 whatever yes 1 2 3
2 bicycle value value 2:30 whatever yes
4 non-bic value value 3:30 whatever no 4


I have tried

df_ = df[['Rating', 'Value']].dropna().set_index('Value', append=True).Rating.unstack()

df.drop('Value', 1).merge(df_, right_index=True, left_index=True, how='left').fillna('')

Answer

One way would be to create an intermediate dataframe and then use outer merge.

In [102]: df
Out[102]: 
   ID    Thing Level1 Level2  Time OAttribute IsTrue  Score  Value
0   1  bicycle  value  value  9:30   whatever    yes    1.0  type1
1   1  bicycle  value  value  9:30   whatever    yes    2.0  type2
2   2  bicycle  value  value  2:30   whatever     no    NaN    NaN
3   4  non-bic  value  value  3:30   whatever     no    4.0  type3
4   1  bicycle  value  value  9:30   whatever    yes    3.0  type3

In [103]: dg = pd.DataFrame(columns=pd.np.append(df['Value'].dropna().unique(), ['ID']))

In [104]: for i in range(len(df)):
     ...:     key = df.loc[i]['Value']
     ...:     value = df.loc[i]['Score']
     ...:     ID = df.loc[i]['ID']
     ...:     if key is not pd.np.nan:
     ...:         dg.loc[i, key] = value
     ...:         dg.loc[i, 'ID'] = ID
     ...:                 

In [105]: dg
Out[105]: 
  type1 type2 type3 ID
0     1   NaN   NaN  1
1   NaN     2   NaN  1
3   NaN   NaN     4  4
4   NaN   NaN     3  1

In [106]: dg = dg.groupby('ID').sum().reset_index()

In [107]: dg
Out[107]: 
   ID  type1  type2  type3
0   1    1.0    2.0    3.0
1   4    NaN    NaN    4.0

In [108]: df[df.columns.difference(['Score', 'Value'])].drop_duplicates().merge(dg, how='outer').fillna('')
Out[108]: 
   ID IsTrue Level1 Level2 OAttribute    Thing  Time type1 type2 type3
0   1    yes  value  value   whatever  bicycle  9:30     1     2     3
1   2     no  value  value   whatever  bicycle  2:30                  
2   4     no  value  value   whatever  non-bic  3:30                 4

Another way to calculate the intermediate data frame would be by avoiding the for loop and using unstack():

In [150]: df
Out[150]: 
   ID    Thing Level1 Level2  Time OAttribute IsTrue  Score  Value
0   1  bicycle  value  value  9:30   whatever    yes    1.0  type1
1   1  bicycle  value  value  9:30   whatever    yes    2.0  type2
2   2  bicycle  value  value  2:30   whatever     no    NaN    NaN
3   4  non-bic  value  value  3:30   whatever     no    4.0  type3
4   1  bicycle  value  value  9:30   whatever    yes    3.0  type3

In [151]: dg = df[['Score', 'Value']].dropna().set_index('Value', append=True).Score.unstack().join(df['ID']).groupby('ID').sum().reset_index()

In [152]: df[df.columns.difference(['Score', 'Value'])].drop_duplicates().merge(dg, how='outer').fillna('')
Out[152]: 
   ID IsTrue Level1 Level2 OAttribute    Thing  Time type1 type2 type3
0   1    yes  value  value   whatever  bicycle  9:30     1     2     3
1   2     no  value  value   whatever  bicycle  2:30                  
2   4     no  value  value   whatever  non-bic  3:30                 4
Comments