hazrmard hazrmard - 2 months ago 11
Python Question

How to split up a string on multiple delimiters but only capture some?

I want to split a string on any combination of delimiters I provide. For example, if the string is:

s = 'This, I think,., کباب MAKES , some sense '


And the delimiters are
\.
,
,
, and
\s
. However I want to capture all delimiters except whitespace
\s
. The output should be:

['This', ',', 'I', 'think', ',.,', 'کباب', 'MAKES', ',', 'some', 'sense']


My solution so far is is using the
re
module:

pattern = '([\.,\s]+)'
re.split(pattern, s)


However, this captures whitespace as well. I have tried using other patterns like
[(\.)(,)\s]+
but they don't work.

Edit: @PadraicCunningham made an astute observation. For delimiters like
Some text ,. , some more text
, I'd only want to remove leading and trailing whitespace from
,. ,
and not whitespace within.

Answer

The following approach would be the most simple one, I suppose ...

s = 'This, I think,., کباب MAKES , some sense '
pattern = '([\.,\s]+)'
splitted = [i.strip() for i in re.split(pattern, s) if i.strip()]

The output:

['This', ',', 'I', 'think', ',.,', 'کباب', 'MAKES', ',', 'some', 'sense']