double0darbo double0darbo - 1 year ago 82
Python Question

Amend column values according to timedelta and index

I would like to change my data in a pandas dataframe.

The data I collect needs to be assigned a step value. The conditions of what triggers a step change are occasionally time or high pressure or temperature values. I cannot get past the first step: When the row is over a certain pressure (1100 psi) and under a temp (40 C), this is the "dilution" phase.

When attempting to change the value with:

df.ix[(df['press'] > 1100) & (df['temp'] < 40),'proc'] = 'dilute';

I only seem to modify the top two rows.

time mass temp press proc
00:00:00 10:58:07 21.947102 23.306101 1.830506 dilute
00:00:01 10:58:08 22.076259 23.306101 57.274142 dilute
00:00:02 10:58:09 22.094710 23.306101 196.000203 pressurize
00:00:03 10:58:10 22.113161 23.306101 293.318991 pressurize
00:00:03 10:58:10 22.094710 23.306101 361.161415 pressurize

time mass temp press proc
00:36:12 11:34:19 18.201538 39.798763 -1.678585 pressurize
00:36:13 11:34:20 18.183087 39.719165 -1.444645 pressurize
00:36:14 11:34:21 18.183087 39.671407 -1.444645 pressurize
00:36:15 11:34:22 18.219989 39.703246 -1.444645 pressurize
00:36:16 11:34:23 18.201538 39.758964 -1.444645 pressurize

Upon further inspection, the indexing does seem to work, giving me the index where I would expect to see the dilution occur...

print(df.ix[(df['press'] > 1100) & (df['temp'] < 40),'proc'].head(),
df.ix[(df['press'] > 1100) & (df['temp'] < 40),'proc'].tail())
00:00:26 pressurize
00:00:27 pressurize
00:00:28 pressurize
00:00:29 pressurize
00:00:30 pressurize
Name: proc, dtype: object time
00:26:08 pressurize
00:26:09 pressurize
00:26:10 pressurize
00:26:11 pressurize
00:26:12 pressurize
Name: proc, dtype: object

However, when applying it to my data, I get only the first two values changed, and the message--

FutureWarning: in the future, boolean array-likes will be handled as a
boolean array index values[indexer] = value'

Running the cookbook examples does give the expected response.

It seems that I have a nested index, but I'm not clear on why, or how to go about amending this. There are a few layers here and searches for solutions have not proved useful or provided the best route to help clarify.

I thought to reset the index, and go with numbers, but I need to sort steps by values and timedeltas.

The index is a timedelta, which I needed to normalize a number of runs launched over a number of periods to start all runs at the same time 0 seconds. My searches only yield date munging and not time, hence my normalizing values to zero with a timedelta index.

If there is a better way to publish this question, or more clarity, please ask. I'm more than willing to add clarity or trim. It is hard to predict what the helpful info would look like to a professional coder.

With help from Merlin, the answer was accepted and works. However, the solution still requires a lot of brute force, and skirts the heart of where the confusion came from: The lack of versatility in having to set up conditions for each call, thanks to the timedelta index. Since I have several conditions that need to be assigned for stages during data collection, I was hoping for a cleaner option.

Here is the rundown:

df['proc'] = np.where((df['press']>1100),'gas soak','pressurize')

idxPnotT = df[df.proc == 'gas soak'].index.tolist()
idxHS = idxPnotT[0]
idxDil0 = idxPnotT[0] + pd.Timedelta(minutes=1)
idxPnot100 = df[(df['press'] > 100)].index.tolist()
idxPnot100 = idxPnot100[-1];

df.loc[idxHS:idxDil0].proc = 'gas soak'
df.loc[idxPostHS:idxPnot100].proc = 'vent'
df.loc[idxPnot100:].proc = 'open'

x = df.ix[df.proc == 'dilute']

The info may look slightly different, but needed to use a new dataset:
<class 'pandas.core.frame.DataFrame'>
TimedeltaIndex: 3383 entries, 00:00:00 to 00:56:25
Data columns (total 5 columns):
time 3383 non-null object
mass 3383 non-null float64
temp 3383 non-null float64
press 3383 non-null float64
proc 3383 non-null object
dtypes: float64(3), object(2)
memory usage: 158.6+ KB


Out[138]: In [139]:
TimedeltaIndex(['00:00:00', '00:00:01', '00:00:02', '00:00:03', '00:00:04',
'00:00:05', '00:00:06', '00:00:07', '00:00:08', '00:00:09',
'00:56:16', '00:56:17', '00:56:18', '00:56:19', '00:56:20',
'00:56:21', '00:56:22', '00:56:23', '00:56:24', '00:56:25'],
dtype='timedelta64[ns]', name='time', length=3383, freq=None)

The code isn't pretty, and lacks the smoothness python allows, and I continue to get errors, without really knowing where they come from having tested both options from the caveats on the pandas page:

**SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] =
value instead

See the caveats in the documentation:
self[name] = value**

Thanks again for all of the help!

Answer Source

Try this


df['proc']  = np.where((df['press'] > 1100) & (df['temp'] < 40),'dilute', "pressurized")
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download