David Faux David Faux - 1 year ago 76
Python Question

How do I separate words using regex in python while considering words with apostrophes?

I tried separate m's in a python regex by using word boundaries and find them all. These m's should either have a whitespace on both sides or begin/end the string:

r = re.compile("\\bm\\b")
re.findall(r, someString)

However, this method also finds m's within words like
since apostrophes are considered to be word boundaries. How do I write a regex that doesn't consider apostrophes as word boundaries?

I've tried this:

r = re.compile("(\\sm\\s) | (^m) | (m$)")
re.findall(r, someString)

but that just doesn't match any m. Odd.

Answer Source

Using lookaround assertion:

>>> import re
>>> re.findall(r'(?<=\s)m(?=\s)|^m|m$', "I'm a boy")
>>> re.findall(r'(?<=\s)m(?=\s)|^m|m$', "I m a boy")
>>> re.findall(r'(?<=\s)m(?=\s)|^m|m$', "mama")
>>> re.findall(r'(?<=\s)m(?=\s)|^m|m$', "pm")


Matches if ... matches next, but doesn’t consume any of the string. This is called a lookahead assertion. For example, Isaac (?=Asimov) will match 'Isaac ' only if it’s followed by 'Asimov'.


Matches if the current position in the string is preceded by a match for ... that ends at the current position. This is called a positive lookbehind assertion. (?<=abc)def will find a match in abcdef, ...

from Regular expression syntax

BTW, using raw string (r'this is raw string'), you don't need to escape \.

>>> r'\s' == '\\s'
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download