For example I have the below list of strings as input corpus (actually its a big list with 100 values).
Data contains a column called action_description. How can I extract all the string matches in the action_description using action list as input corpus?
Note: I have already done lemmitization description_action, so if the column have words like jumping or jumped its already converted to jump.
Sample input & output
"I love to run and while my friend prefer to swim" --> "run swim"
"Allan excels at high jump but he is not a good at running" --> "jump run"
pos='v'and let the nouns remain as they were before by iterating thorugh each word in that list got by
from nltk.stem.wordnet import WordNetLemmatizer action = ['jump','fly','run','swim'] # lookup list lem = WordNetLemmatizer() fcn = lambda x: " ".join(set([lem.lemmatize(w, 'v') for w in x]).intersection(set(action))) df['action_description'] = df['action_description'].str.split().apply(fcn) df
df = pd.DataFrame(dict(action_description=["I love to run and while my friend prefer to swim", "Allan excels at high jump but he is not a good at running"]))
To generate binary flags (0/1), we can use
str.get_dummies method by splitting strings on whitespace and computing it's indicator variables as shown:
bin_flag = df['action_description'].str.get_dummies(sep=' ').add_suffix('_flag') pd.concat([df['action_description'], bin_flag], axis=1)