zihan meng zihan meng - 27 days ago 5
Python Question

How to remove strings between two characters using regular expression python

I am trying to clean up some log and want to extract general information from the message. I am newie to python and just learn regular expression yesterday and now have problems.

My message look like this:

Report ZSIM_RANDOM_DURATION_ started
Report ZSIM_SYSTEM_ACTIVITY started
Report /BDL/TASK_SCHEDULER started
Report ZSIM_JOB_CREATE started
Report RSBTCRTE started
Report SAPMSSY started
Report RSRZLLG_ACTUAL started
Report RSRZLLG started
Report RGWMON_SEND_NILIST started


I try to some code:

clean_special2=re.sub(r'^[Report] [^1-9] [started]','',text)


but I think this code will remove all rows however I want to keep the format like Report .....Started. So I only want to remove the jobs name in the middle.

I expect my outcome looks like this:

Report started


Anyone can help me with a idea? Thank you very much!

Answer

Try something like this:

clean_special2=re.sub(r'(?<=^Report\b).*(?=\bstarted)',' ',text)

Explanation: the (?<=...) is a positive lookbehind, e.g. the string must match the content of this group, but it will not be captured and thus not replaced. Same thing on the other side with a positive look-ahead (?=...). The \b is a word boundary, so that everything between these words will be matched. Since this will also trim away the whitespace, the replacement is a single whitespace.