Acapello Acapello - 4 months ago 10
Python Question

Regular expression on Python to split string by symbol without deleting

I am trying to write a function that does something like this:

>> foo("String. New sentence. And again.")
["String.", "New sentence.", "And again."]


I read the documentation on
regex
and wrote this code:

def foo(string):
return re.split(r'([.!?])', string)


This function outputs keeps the punctuation, but separates it from the sentences:

["String", ".", "New sentence", ".", "And again", "."]


I want to have the 3 whole sentences separated.

How can I improve this function?

Answer

Include the characters before the ending delimiter, and try re.findall:

>>> import re
>>> s = "String. New sentence. And again."
>>> re.findall(r'[^ ].*?[.!?]', s)
['String.', 'New sentence.', 'And again.']

The [^ ] means we're looking for something that doesn't start with a space. The .*? means any characters, not including the [.!?] that follows them.