Luke Nisbet - 1 month ago 17

Python Question

Hi I am new to using pandas coming from a SAS background and I am trying to segment a continuous variable into bands using the following code.

`var_range = df['BILL_AMT1'].max() - df['BILL_AMT1'].min()`

a= 10

for i in range(1,a):

inc = var_range/a

lower_bound = df['BILL_AMT1'].min() + (i-1)*inc

print('Lower bound is '+str(lower_bound))

upper_bound = df['BILL_AMT1'].max() + (i)*inc

print('Upper bound is '+str(upper_bound))

if (lower_bound <= df['BILL_AMT1'] < upper_bound):

df['bill_class'] = i

i+=1

I am expecting the code to check if the value of

`df['BILL_AMT1']`

`df['bill_class']`

I get the following error:

`ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().`

I think that the if condition is evaluating correctly but the error is due to assigning a new column a value of the for loops counter.

Can anyone explain what is going wrong or suggest an alternative.

Thanks for any help.

Answer

To avoid the `ValueError`

, change

```
if (lower_bound <= df['BILL_AMT1'] < upper_bound):
df['bill_class'] = i
```

to

```
mask = (lower_bound <= df['BILL_AMT1']) & (df['BILL_AMT1'] < upper_bound)
df.loc[mask, 'bill_class'] = i
```

The chained comparison `(lower_bound <= df['BILL_AMT1'] < upper_bound)`

is equivalent to

```
(lower_bound <= df['BILL_AMT1']) and (df['BILL_AMT1'] < upper_bound)
```

The `and`

operator causes the two boolean Series `(lower_bound <= df['BILL_AMT1'])`

, `(df['BILL_AMT1'] < upper_bound)`

to be evaluated in a boolean context -- i.e. reduced to a single boolean value. Pandas refuses to reduce Series to a single boolean value.

Instead, to return a boolean Series, use the `&`

operator instead of `and`

:

```
mask = (lower_bound <= df['BILL_AMT1']) & (df['BILL_AMT1'] < upper_bound)
```

and then to assign values to the `bill_class`

column where `mask`

is True, use `df.loc`

:

```
df.loc[mask, 'bill_class'] = i
```

To bin the data in `df['BILL_AMT1']`

, you could remove the Python `for-loop`

entirely, and as DSM suggests, use `pd.cut`

:

```
df['bill_class'] = pd.cut(df['BILL_AMT1'], bins=10, labels=False)+1
```