D14 D14 - 2 months ago 6
Python Question

Transform to normal Data Frame which has row as list. Split rows to column

My Data Frame output from reading a complex json looks like below.

Where individual row is a list within a single column.

Below is the sample Data Frame(

df
)

col
[A,1,3,4,Null]
[B,4,5,6,Null]
[C,7,8,9,Null]


I tried to split to individual column using pandas but it didnt work as individual row itself is a list.
I want the data frame to look like below.

colA,colB,colC,colD,colE
A 1 3 4 Null
B 4 5 6 Null
C 7 8 9 Null


I dont need the column name to specified manually it can be auto-generated.

Answer

You can use DataFrame.from_records, but first need create nested list from values of column col:

df = pd.DataFrame({'col':[['A',1,3,4,'Null'],['B',4,5,6,'Null'],['C',7,8,9,'Null']]})
print (df)
                  col
0  [A, 1, 3, 4, Null]
1  [B, 4, 5, 6, Null]
2  [C, 7, 8, 9, Null]

print (df.col.values.tolist())
[['A', 1, 3, 4, 'Null'], ['B', 4, 5, 6, 'Null'], ['C', 7, 8, 9, 'Null']]

df1 = pd.DataFrame.from_records(df.col.values.tolist(), 
                                columns=['colA','colB','colC','colD','colE'])

print(df1)
  colA  colB  colC  colD  colE
0    A     1     3     4  Null
1    B     4     5     6  Null
2    C     7     8     9  Null

If dont need specify column names:

df1 = pd.DataFrame.from_records(df.col.values.tolist())
print(df1)
   0  1  2  3     4
0  A  1  3  4  Null
1  B  4  5  6  Null
2  C  7  8  9  Null

Timings:

#len(df) = 4k
df = pd.concat([df]*1000).reset_index(drop=True)

In [80]: %timeit pd.DataFrame(df['col'].apply(pd.Series).values, columns=['colA','colB','colC','colD','colE'])
1 loop, best of 3: 753 ms per loop

In [81]: %timeit pd.DataFrame.from_records(df.col.values.tolist(), columns=['colA','colB','colC','colD','colE'])
100 loops, best of 3: 3.73 ms per loop
Comments