Pylander Pylander - 2 months ago 12
Python Question

Pandas pd.cut Field Where Other Field Equals Value

I am trying to use pd.cut to create a new field. The creation/population of this new field is dependent on a value in another field, however.

hdl_bins = [0,40,59,300]
hdl_labels = ['hdl_high risk','hdl_borderline','hdl_protective']

df['hdl'] = pd.cut(df['value'],bins=hdl_bins,labels=hdl_labels)


I would only like this populate the new field "hdl" when the following criteria is met:

df[(df['name']=='HDL')


How would I best add the "where" criteria to the pd.cut operation? Thanks!

EDIT:

Here is an example of the input:

id,date,name,value
1,1/1/11,Weight,76.3
1,1/2/11,Height,152.7
1,1/3/11,Body mass index (BMI) [Ratio],32.7
1,1/4/11,Temperature,98.6
1,1/5/11,Systolic,118.9
1,1/6/11,Diastolic,69.8
1,1/7/11,LDL,98
1,1/8/11,HDL,63.2
1,1/9/11,Total Cholesterol,263.1
1,1/10/11,Trigl SerPl-mCnc,509.7
1,1/11/11,LDL,98
1,1/12/11,HDL,63.2
1,1/13/11,Total Cholesterol,263.1
1,1/14/11,Trigl SerPl-mCnc,509.7


Desired Output:

id,date,name,value,hdl
1,1/1/11,Weight,76.3,0
1,1/2/11,Height,152.7,0
1,1/3/11,Body mass index (BMI) [Ratio],32.7,0
1,1/4/11,Temperature,98.6,0
1,1/5/11,Systolic,118.9,0
1,1/6/11,Diastolic,69.8,0
1,1/7/11,LDL,98,0
1,1/8/11,HDL,63.2,hdl_protective
1,1/9/11,Total Cholesterol,263.1,0
1,1/10/11,Trigl SerPl-mCnc,509.7,0
1,1/11/11,LDL,98,0
1,1/12/11,HDL,63.2,hdl_protective
1,1/13/11,Total Cholesterol,263.1,0
1,1/14/11,Trigl SerPl-mCnc,509.7,0

Answer

UPDATE:

Pay attention at the last row (value == 'XXX'):

In [55]: df
Out[55]:
    id      date                           name  value
0    1    1/1/11                         Weight   76.3
1    1    1/2/11                         Height  152.7
2    1    1/3/11  Body mass index (BMI) [Ratio]   32.7
3    1    1/4/11                    Temperature   98.6
4    1    1/5/11                       Systolic  118.9
5    1    1/6/11                      Diastolic   69.8
6    1    1/7/11                            LDL     98
7    1    1/8/11                            HDL   63.2
8    1    1/9/11              Total Cholesterol  263.1
9    1   1/10/11               Trigl SerPl-mCnc  509.7
10   1   1/11/11                            LDL     98
11   1   1/12/11                            HDL   63.2
12   1   1/13/11              Total Cholesterol  263.1
13   1   1/14/11               Trigl SerPl-mCnc  509.7
14   1  12/12/12                            HDL    XXX

Solution:

In [56]: df['hdl'] = '0'

In [57]: df.ix[df['name']=='HDL', 'hdl'] = \
   ....:   pd.cut(pd.to_numeric(df.ix[df['name']=='HDL','value'], errors='corce'),bins=hdl_bins,labels=hdl_labels)

In [58]: df
Out[58]:
    id      date                           name  value             hdl
0    1    1/1/11                         Weight   76.3               0
1    1    1/2/11                         Height  152.7               0
2    1    1/3/11  Body mass index (BMI) [Ratio]   32.7               0
3    1    1/4/11                    Temperature   98.6               0
4    1    1/5/11                       Systolic  118.9               0
5    1    1/6/11                      Diastolic   69.8               0
6    1    1/7/11                            LDL     98               0
7    1    1/8/11                            HDL   63.2  hdl_protective
8    1    1/9/11              Total Cholesterol  263.1               0
9    1   1/10/11               Trigl SerPl-mCnc  509.7               0
10   1   1/11/11                            LDL     98               0
11   1   1/12/11                            HDL   63.2  hdl_protective
12   1   1/13/11              Total Cholesterol  263.1               0
13   1   1/14/11               Trigl SerPl-mCnc  509.7               0
14   1  12/12/12                            HDL    XXX             NaN

Old answer:

In [13]: df
Out[13]:
   value name
0    123  XXX
1     18  LDL
2    195  LDL
3     25  XXX
4     70  LDL
5     11  LDL
6    199  XXX
7    163  LDL
8     32  LDL
9     85  XXX

In [14]: hdl_bins = [0,40,59,300]

In [15]: hdl_labels = ['hdl_high risk','hdl_borderline','hdl_protective']

In [16]: df['hdl'] = ''

In [22]: df.ix[df['name']=='LDL', 'hdl'] = \
   ....:     pd.cut(df.ix[df['name']=='LDL','value'],bins=hdl_bins,labels=hdl_labels)

In [23]: df
Out[23]:
   value name             hdl
0    123  XXX
1     18  LDL   hdl_high risk
2    195  LDL  hdl_protective
3     25  XXX
4     70  LDL  hdl_protective
5     11  LDL   hdl_high risk
6    199  XXX
7    163  LDL  hdl_protective
8     32  LDL   hdl_high risk
9     85  XXX
Comments