Cyrine Ezzahra Cyrine Ezzahra - 3 months ago 6
Python Question

add new columns with conditions to a dataframe using python

I create a dataframe df_energy :

df_energy=pd.read_csv('C:/Users/Demonstrator/Downloads/power.csv', delimiter=';', parse_dates=[0], infer_datetime_format = True)


with this structure :

df_energy.info()



<class 'pandas.core.frame.DataFrame'>
RangeIndex: 43229 entries, 0 to 43228
Data columns (total 6 columns):
TIMESTAMP 43229 non-null datetime64[ns]
P_ACT_KW 40376 non-null float64
PERIODE_TARIF 43209 non-null object
P_SOUSCR 37501 non-null float64
SITE 43229 non-null object
TARIF 43229 non-null object
dtypes: datetime64[ns](1), float64(2), object(3)
memory usage: 2.0+ MB


TIMESTAMP P_ACT_KW PERIODE_TARIF P_SOUSCR SITE TARIF
2015-07-31 23:00:00 12.0 HC NaN ST GEREON TURPE_HTA5
2015-07-31 23:10:00 466.0 HC 425.0 ST GEREON TURPE_HTA5
2015-07-31 23:20:00 18.0 HC 425.0 ST GEREON TURPE_HTA5
2015-07-31 23:30:00 17.0 HC 425.0 ST GEREON TURPE_HTA5



As I am starting learning python, I would like to know can I add three new columns : High_energy, Medium_energy and low_energy .

High_energy contains the P_ACT_KW value if P_ACT_KW > 400, Medium_energy contains the P_ACT_KW value if P_ACT_KW is between 200 and 400, Low_energy contains the P_ACT_KW value if P_ACT_KW < 200.
For example :


TIMESTAMP P_ACT_KW PERIODE_TARIF P_SOUSCR SITE TARIF High_energy Medium_energy Low_energy
2015-07-31 23:00:00 12.0 HC NaN ST GEREON TURPE_HTA5 0 0 12
2015-07-31 23:10:00 466.0 HC 425.0 ST GEREON TURPE_HTA5 466 0 0
2015-07-31 23:20:00 18.0 HC 425.0 ST GEREON TURPE_HTA5 0 0 18
2015-07-31 23:30:00 17.0 HC 425.0 ST GEREON TURPE_HTA5 0 0 17



Thank you

Kind regards

Answer

you can use np.where from numpy as:
Sample df:

Out[71]: 
             TIMESTAMP  P_ACT_KW PERIODE_TARIF  P_SOUSCR       SITE  \
0  2015-07-31 23:00:00      12.0            HC       NaN  ST GEREON   
1  2015-07-31 23:10:00     466.0            HC     425.0  ST GEREON   
2  2015-07-31 23:20:00      18.0            HC     425.0  ST GEREON   
3  2015-07-31 23:30:00      17.0            HC     425.0  ST GEREON   

        TARIF  
0  TURPE_HTA5  
1  TURPE_HTA5  
2  TURPE_HTA5  
3  TURPE_HTA5

df['high_energy']=np.where(df['P_ACT_KW']>400,df['P_ACT_KW'],0)

df['medium_energy']=np.where((df['P_ACT_KW']>200)&(df['P_ACT_KW']<400),df['P_ACT_KW'],0)

df['low_energy']=np.where(df['P_ACT_KW']<200,df['P_ACT_KW'],0)

Out[72]: 
             TIMESTAMP  P_ACT_KW PERIODE_TARIF  P_SOUSCR       SITE  \
0  2015-07-31 23:00:00      12.0            HC       NaN  ST GEREON   
1  2015-07-31 23:10:00     466.0            HC     425.0  ST GEREON   
2  2015-07-31 23:20:00      18.0            HC     425.0  ST GEREON   
3  2015-07-31 23:30:00      17.0            HC     425.0  ST GEREON   

        TARIF  high_energy  medium_energy  low_energy  
0  TURPE_HTA5          0.0            0.0        12.0  
1  TURPE_HTA5        466.0            0.0         0.0  
2  TURPE_HTA5          0.0            0.0        18.0  
3  TURPE_HTA5          0.0            0.0        17.0