Rita Rita - 5 months ago 14
Python Question

How to remove substrings marked with special characters from a string?

I have a string in Python:

Tt = "This is a <\"string\">string, It should be <\"changed\">changed to <\"a\">a nummber."

print Tt

'This is a <"string">string, It should be <"changed">changed to <"a">a nummber.'

You see the some words repeat in this part
<\" \">.

My question is, how to delete those repeated parts (delimited with the named characters)?

The result should be like:

'This is a string, It should be changed to a nummber.'


Use regular expressions:

import re
Tt = re.sub('<\".*?\">', '', Tt)

Note the ? after *. It makes the expression non-greedy, so it tries to match so few symbols between <\" and \"> as possible.

The Solution of James will work only in cases when the delimiting substrings consist only from one character (< and >). In this case it is possible to use negations like [^>]. If you want to remove a substring delimited with character sequences (e.g. with begin and end), you should use non-greedy regular expressions (i.e. .*?).