ingnie ingnie - 4 months ago 25
Python Question

Creating a new column in Pandas by selecting part of string in other column

I have a lot of experience programming in Matlab, now using Python and I just don't get this thing to work... I have a dataframe containing a column with timecodes like 00:00:00.033.

timecodes = ['00:00:01.001', '00:00:03.201', '00:00:09.231', '00:00:11.301', '00:00:20.601', '00:00:31.231', '00:00:90.441', '00:00:91.301']
df = pd.DataFrame(timecodes, columns=['TimeCodes'])

All my inputs are 90 seconds or less, so I want to create a column with just the seconds as float. To do this, I need to select position 6 to end and make that into a float, which I can do for the first row like:


This works just fine, but if I now want to create a whole new column 'Time_sec', the following does not work:

df['Time_sec'] = float(df['TimeCodes'][:][6:])

Because df['TimeCodes'][:][6:] takes row 6 to last row, while I want WITHIN each row the 6th till last position. Also this does not work:

df['Time_sec'] = float(df['TimeCodes'][:,6:])

Do I need to make a loop? There must be a better way... And why does df['TimeCodes'][:][6:] not work?


You can use the slice string method and then cast the whole thing to a float:

In [13]: df["TimeCodes"].str.slice(6).astype(float)
0     1.001
1     3.201
2     9.231
3    11.301
4    20.601
5    31.231
6    90.441
7    91.301
Name: TimeCodes, dtype: float64

As to why df['TimeCodes'][:][6:] doesn't work, what this ends up doing is chaining some selections. First you grab the pd.Series associated with the TimeCodes column, then you select all of the items from the Series with [:], and then you just select the items with index 6 or higher with [6:].