Mangostaniko Mangostaniko -5 years ago 90
Python Question

Python regex split into characters except if followed by parentheses

I have a string like

, where each character except for parentheses and what they enclose represents a kind of instruction. A character can be followed by an optional list of arguments specified in optional parentheses.

Such a string i would like to split the string into
['F(230,24)', 'F', '[', 'f(22)', '_(23)', ';(2)', '%', '[', '+(45)', 'F', 'F', ']', ']']
, however at the moment i only get
['F(230,24)', 'F', '[', 'f(22)_(23);(2)', '%', '[', '+(45)', 'F', 'F', ']', ']']
(a substring was not split correctly).

Currently i am using
list(filter(None, re.split(r'([A-Za-z\[\]\+\-\^\&\\\/%_;~](?!\())', string)))
, which is just a mess of characters and a negative lookahead for
list(filter(None, <list>))
is used to remove empty strings from the result.

I am aware that this is likely caused by Python's
having been designed not to split on a zero length match, as discussed here.
However i was wondering what would be a good solution? Is there a better way than

Thank you.

EDIT: Unfortunately i am not allowed to use custom packages like

Answer Source

You can use re.findall to find out all single character optionally followed by a pair of parenthesis:

import re
s = "F(230,24)F[f(22)_(23);(2)%[+(45)FF]]"
re.findall("[^()](?:\([^()]*\))?", s)

  • [^()] match a single character except for parenthesis;
  • (?:\([^()]*\))? denotes a non-capture group(?:) enclosed by a pair of parenthesis and use ? to make the group optional;
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download