zihan meng zihan meng - 9 months ago 39
Python Question

How to remove strings between two characters using regular expression python

I am trying to clean up some log and want to extract general information from the message. I am newie to python and just learn regular expression yesterday and now have problems.

My message look like this:

Report /BDL/TASK_SCHEDULER started
Report ZSIM_JOB_CREATE started
Report RSBTCRTE started
Report SAPMSSY started
Report RSRZLLG_ACTUAL started
Report RSRZLLG started

I try to some code:

clean_special2=re.sub(r'^[Report] [^1-9] [started]','',text)

but I think this code will remove all rows however I want to keep the format like Report .....Started. So I only want to remove the jobs name in the middle.

I expect my outcome looks like this:

Report started

Anyone can help me with a idea? Thank you very much!

Answer Source

Try something like this:

clean_special2=re.sub(r'(?<=^Report\b).*(?=\bstarted)',' ',text)

Explanation: the (?<=...) is a positive lookbehind, e.g. the string must match the content of this group, but it will not be captured and thus not replaced. Same thing on the other side with a positive look-ahead (?=...). The \b is a word boundary, so that everything between these words will be matched. Since this will also trim away the whitespace, the replacement is a single whitespace.