Symon Symon - 8 days ago 4
Python Question

Split string with regex separator except when separator is escaped

I have a code (consider 'Z' as escape character, and ',' as separator):

import re

a = 'aaa,bbbZ,cccZZ,dddZZZ,eee'
print re.split(r'(?<!Z)[,]+', a)

Result is:

['aaa', 'bbbZ,cccZZ,dddZZZ,eee']

But I need the result processed escaped sequences (in my example escape char is 'Z'):

['aaa', 'bbbZ,cccZZ', 'dddZZZ,eee']

When I try to use variable width pattern for negative lookbehind assertion:

print re.split(r'(?<!(ZZ)*Z)[,]+', a)

it says:

sre_constants.error: look-behind requires fixed-width pattern


You may match the sequences with a pattern that will either match any chars that are not a comma, or any 1+ commas preceded with odd number of Zs:

import re
a = 'aaa,bbbZ,cccZZ,dddZZZ,eee'
print(re.findall(r'(?:(?<!Z)Z(?:ZZ)*,+|[^,])+', a))
# => ['aaa', 'bbbZ,cccZZ', 'dddZZZ,eee']

See the Python demo and a regex demo.

Pattern details:

  • (?:(?<!Z)Z(?:ZZ)*,+|[^,])+ - 1 or more occurrences of:
    • (?<!Z)Z - a Z not immediately preceded with Z
    • (?:ZZ)* - zero or more sequences of ZZ
    • ,+ - 1 or more commas
    • | - or
    • [^,] - any char that is not a comma