Dom B Dom B - 3 years ago 209
Python Question

How to search for specific text within a Pandas dataframe column

I am wanting to identify all instances within my Pandas csv file that contains text for a specific column, in this case the 'Notes' column, where there are any instances the word 'excercise' is mentioned. Once the rows are identified that contain the 'excercise' keyword in the 'Notes' columnn, I want to create a new column called 'ExcerciseDay' that then has a 1 if the 'excercise' condition was met or a 0 if it was not. I am having trouble because the text can contain long string values in the 'Notes' column (i.e. 'Excercise, Morning Workout,Alcohol Consumed, Coffee Consumed') and I still want it to identify 'excercise' even if it is within a longer string.

I tried the function below in order to identify all text that contains the word 'exercise' in the 'Notes' column. No rows are selected when I use this function and I know it is likely because of the * operator but I want to show the logic. There is probably a much more efficient way to do this but I am still relatively new to programming and python.

def IdentifyExercise(row):
if row['Notes'] == '*exercise*':
return 1
elif row['Notes'] != '*exercise*':
return 0


JoinedTables['ExerciseDay'] = JoinedTables.apply(lambda row : IdentifyExercise(row), axis=1)

Answer Source

Convert boolean Series created by str.contains to int by astype:

JoinedTables['ExerciseDay'] = JoinedTables['Notes'].str.contains('exercise').astype(int)

For not case sensitive:

JoinedTables['ExerciseDay'] = JoinedTables['Notes'].str.contains('exercise', case=False)
                                                   .astype(int)
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download