vhora vhora - 10 months ago 78
Python Question

Python Regex handling dot character

While using regex in python I came across a scenario.
What I am trying to do is if a string has operators, I want to add space before and after the operator.

s = 'H>=ll<=o=wo+rl-d.my name!'
op = 'H >= ll <= o = wo + rl - d.my name!'

seemed pretty straight forward, so I came up with the following expression:

re.sub(r'((<=)|(>=)|[+-=*/])+',' \\1 ',r'H>=ll<=o=wo+rl-d.myname!')

but the result I am getting using this is :

'H >= ll <= o = wo + rl - d . my name!'

Its adding a space after every dot (.) encountered, even though I haven't mentioned it in my regex.

I am using python 2.7 and would really appreciate if you can shed some light on this.

Answer Source

The reason for the spaces around the dot is -. Concrete it is [+-=], which is a character class with characters from + until =, which includes ..

To avoid this, you must escape - with \-, e.g.

re.sub(r'((<=)|(>=)|[+\-=*/])+',' \\1 ',r'H>=ll<=o=wo+rl-d.myname!')

As @LaurentLAPORTE mentioned, you can also put - at the beginning or the end of a character class, e.g. [-+=*/] or [+=*/-] will both do the trick.