Jens de Bruijn Jens de Bruijn - 4 years ago 107
Python Question

Regex: match consecutive punctuation marks and replace by the first

I am trying to remove consecutive punctuation marks and replace them with the first. Thus:


  1. u.s., -> u.s.

  2. u.s. -> u.s.

  3. u.s.! -> u.s.

  4. hiiii!!!, -> hiiii!



I tried the following code:

import re
r = re.compile(r'([.,/#!$%^&*;:{}=-_`~()])*\1')
n = r.sub(r'\1', "ews by almalki : Tornado, flood deaths reach 18 in U.s., more storms ahead ")
print(n)

Answer Source

You just need to capture the first punctuation mark and match the rest:

([.,/#!$%^&*;:{}=_`~()-])[.,/#!$%^&*;:{}=_`~()-]+

See the regex demo

Note that the - must be put at the end (or start) of the character class in order not to create a range (or it can be escaped inside the character class).

Details:

  • ([.,/#!$%^&*;:{}=_`~()-]) - capturing group with the punctuation symbols you defined
  • [.,/#!$%^&*;:{}=_`~()-]+ - 1+ punctuation symbols

Python demo:

import re
r = re.compile(r'([.,/#!$%^&*;:{}=_`~()-])[.,/#!$%^&*;:{}=_`~()-]+')
n = r.sub(r'\1', "ews by almalki : Tornado, flood deaths reach 18 in U.s., more storms ahead ")
print(n)
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download