Adam - 1 year ago 46
Python Question

# Regular expression to capture groups delimited by an expression

How to capture

`[0-9]+`
groups delimited by
`(\], \[)`
? For example, in the case of

``````[[[[u'1', u'2'], u'3'], u'4'], [[[u'1', u'2'], u'4'], [[u'1', u'5'], u'4']]]
``````

I would like to capture three groups,
`1 2 3 4`
,
`1 2 4`
and
`1 5 4`
.

Assuming that you don't have sub-patterns like `[[u'1', u'2'], [u'3',u'5']]` (multiple nested sub-groups at the same level, in which case you need to use a stack and parse like pushdown automata) you could do this with regular expressions in two steps:

(1) split the expression with regex `\]\s*,\s*\[` to get the groups first, you will get 3 groups for the example provided.

(2) within each group use the regex `[^0-9u]*u'([0-9]+)'[^0-9u]*` to extract the digits.

For example, in `R`, the code will be:

``````str <- "[[[[u'1', u'2'], u'3'], u'4'], [[[u'1', u'2'], u'4'], [[u'1', u'5'], u'4']]]"
groups <- unlist(strsplit(str, split='\\]\\s*,\\s*\\['))
pattern <- "[^0-9u]*u'([0-9]+)'[^0-9u]*"
lapply(groups, function(str) gsub(pattern, "\\1", regmatches(str,gregexpr(pattern,str))[[1]]))

#[[1]]
#[1] "1" "2" "3" "4"

#[[2]]
#[1] "1" "2" "4"

#[[3]]
#[1] "1" "5" "4"
``````

In `python`:

``````import re
str = "[[[[u'1', u'2'], u'3'], u'4'], [[[u'1', u'2'], u'4'], [[u'1', u'5'], u'4']]]"
groups = re.split('\]\s*,\s*\[', str)
pattern = "[^0-9u]*u'([0-9]+)'[^0-9u]*"
print map(lambda x: re.findall(pattern, x), groups)
# [['1', '2', '3', '4'], ['1', '2', '4'], ['1', '5', '4']]
``````

you could map the digits to integers if required.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download