Mangostaniko Mangostaniko - 4 months ago 12
Python Question

Python regex split into characters except if followed by parentheses

I have a string like

"F(230,24)F[f(22)_(23);(2)%[+(45)FF]]"
, where each character except for parentheses and what they enclose represents a kind of instruction. A character can be followed by an optional list of arguments specified in optional parentheses.

Such a string i would like to split the string into
['F(230,24)', 'F', '[', 'f(22)', '_(23)', ';(2)', '%', '[', '+(45)', 'F', 'F', ']', ']']
, however at the moment i only get
['F(230,24)', 'F', '[', 'f(22)_(23);(2)', '%', '[', '+(45)', 'F', 'F', ']', ']']
(a substring was not split correctly).

Currently i am using
list(filter(None, re.split(r'([A-Za-z\[\]\+\-\^\&\\\/%_;~](?!\())', string)))
, which is just a mess of characters and a negative lookahead for
(
.
list(filter(None, <list>))
is used to remove empty strings from the result.

I am aware that this is likely caused by Python's
re.split
having been designed not to split on a zero length match, as discussed here.
However i was wondering what would be a good solution? Is there a better way than
re.findall
?

Thank you.

EDIT: Unfortunately i am not allowed to use custom packages like
regex
module

Answer

You can use re.findall to find out all single character optionally followed by a pair of parenthesis:

import re
s = "F(230,24)F[f(22)_(23);(2)%[+(45)FF]]"
re.findall("[^()](?:\([^()]*\))?", s)

['F(230,24)',
 'F',
 '[',
 'f(22)',
 '_(23)',
 ';(2)',
 '%',
 '[',
 '+(45)',
 'F',
 'F',
 ']',
 ']']
  • [^()] match a single character except for parenthesis;
  • (?:\([^()]*\))? denotes a non-capture group(?:) enclosed by a pair of parenthesis and use ? to make the group optional;