Sakura Sakura - 2 months ago 12
Python Question

split several columns using pandas

I want to split string in several columns. For example, I'd like to select some information from col2, col3 and col5 in below dataframe (but indeed I have more than hundred columns to do so).

d = pd.DataFrame({
'col1' : ['USA', 'AGN'],
'col2' : ['0|0:0.014:0.986,0.013,0', '1|0:0.02:1.936,0.023,1'],
'col3' : ['1|0:0.024:0.9,0.01345,2', '0|2:0.213:0.92,0.1,2'],
'col4' : ['done', 'done'],
'col5' : ['2|0:0.02:1.936,0.023,1', '1|0:0.024:0.9,0.01345,2']
})

col1 col2 col3 col4 .....
0 USA 0|0:0.014:0.986,0.013,0 1|0:0.024:0.9,0.01345,2 done .....
1 AGN 1|0:0.02:1.936,0.023,1 0|2:0.213:0.92,0.1,2 done .....


I only need first 3 marks from that long string. Then I expect I can see from my result such as below.

col1 col2 col3 col4 col5 ....
USA 0|0 1|0 done 2|0 ....
AGN 1|0 0|2 done 1|0 ....


Any hint please?

Answer

if i understood your question correctly, you can do it this way:

In [254]: d.replace(r':.*', '', regex=True)
Out[254]:
  col1 col2 col3  col4 col5
0  USA  0|0  1|0  done  2|0
1  AGN  1|0  0|2  done  1|0