nilesh nilesh -4 years ago 194
Python Question

Pandas split a column in data-frame and get headers

I have a pandas data-frame with column 'A'

dfc = pd.DataFrame( {"A": ['AB=0.246154;ABP=39.3908;AC=3', 'AB=0.3;ABP=9.95901;AC=2;AF=0.333333', 'AB=0;ABP=0;AC=6;AF=1;AN=6;AO=86', 'AB=0.461538;ABP=3.51141;AC=2']})


I want to split column 'A' in the data-frame and get new data-frame like,

A AB ABP AC AF AN AO
0 AB=0.246154;ABP=39.3908;AC=3 0.246154 39.3908 3 None None None
1 AB=0.3;ABP=9.95901;AC=2;AF=0.333333 0.3 9.95901 2 0.333333 None None
2 AB=0;ABP=0;AC=6;AF=1;AN=6;AO=86 0 0 6 1 6 86
3 AB=0.461538;ABP=3.51141;AC=2 0.461538 3.51141 2 None None None


I tried to split the data-frame column using,

dfc.A.str.split(';', expand = True)


But it gives new data-frame like,

0 1 2 3 4 5
0 AB=0.246154 ABP=39.3908 AC=3 None None None
1 AB=0.3 ABP=9.95901 AC=2 AF=0.333333 None None
2 AB=0 ABP=0 AC=6 AF=1 AN=6 AO=86
3 AB=0.461538 ABP=3.51141 AC=2 None None None


How add headers to columns using text before "=" in column and add this new data-frame to original data-frame?
Is there pythonic way to do these two operations in one line?

Thanks

Answer Source

using extractall

e = dfc.A.str.extractall('([^;]+)=([^;]+)')
pd.Series(e.values[:, 1], [e.index.get_level_values(0), e.values[:, 0]]).unstack()

         AB      ABP AC        AF    AN    AO
0  0.246154  39.3908  3      None  None  None
1       0.3  9.95901  2  0.333333  None  None
2         0        0  6         1     6    86
3  0.461538  3.51141  2      None  None  None
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download