I have a dataframe with multiple columns and a few 1000 rows with text data. One column contains floats that represent time in ascending order (0, 0.45, 0.87, 1.10 etc). From this I want to build a new dataframe that contains only all the rows where these time values are closest to the integers x = 0,1,2,3......etc
Here on Stackoverflow I found an answer to a very similar question, answer posted by DSM. The code is essentially this, modified (hopefully) to give -the- closest number to x, df is my data frame.
for x in np.arange(len(df)):
Don't know how fast this would be, but you could round the times to get "integer" candidates, take the absolute value of the difference to give yourself a way to find the closest, the sort by difference, and then
groupby the integer time to return just the rows that are close to integers:
# setting up my fake data df=pd.DataFrame() df['ElapsedTime']=pd.Series([0.5, 0.8, 1.1, 1.4, 1.8, 2.2, 3.1]) # To use your own data set, set df = Z, and start here... df['bintime'] = df.ElapsedTime.round() df['d'] = abs(df.ElapsedTime - df.bintime) dfindex = df.sort('d').groupby('bintime').first()
For the fake time series defined above, the contents of
ElapsedTime d bintime 0 0.5 0.5 1 1.1 0.1 2 1.8 0.2 3 3.1 0.1