Liz Liz - 3 months ago 19
Python Question

Unpack a string into an expanded string

I am given a string in the following format:

"a{1;4:6}"
and
"a{1;2}b{2:4}"
where the
;
represents two different numbers, and a
:
represents a sequence of numbers. There can be any number of combinations of semicolons and colons within the brace.

I want to expand it such that these are the results of expanding the two examples above:


  • "a{1;4:6}"
    = "a1a4a5a6"

  • "a{1;2}b{2:4}" = "a1b2b3b4a2b2b3b4"



I've never had to deal with something like this before, since I am usually given strings in some sort of ready-made format which is easily parsable. In this case I have to parse the string manually.

My attempt is to split the string manually, over and over again, until you hit a case where there is either a colon or a semicolon, then start building the string from there. This is horribly inefficient, and I would appreciate any thoughts on this approach. Here is essentially what the code looks like (I omitted a lot of it, just to get the point across more quickly):

>>> s = "a{1;4:6}"
>>> splitted = s.split("}")
>>> splitted
['a{1;4:6', '']
>>> splitted2 = [s.split("{") for s in splitted]
>>> splitted2
[['a', '1;4:6'], ['']]
>>> splitted3 = [s.split(";") for s in splitted2[0]]
>>> splitted3
[['a'], ['1', '4:6']]

# ... etc, then build up the strings manually once the ranges are figured out.


The thinking behind splitting at the close brace at first is that it is guaranteed that a new identifier, with an associated range comes up after it. Where am I going wrong? My approach works for simple strings such as the first example, but it doesn't for the second example. Furthermore it is inefficient. I would be thankful for any input on this problem.

Answer
import re

def expand(compressed):

    # 'b{2:4}' -> 'b{2;3;4}' i.e. reduce the problem to just one syntax
    normalized = re.sub(r'(\d+):(\d+)', lambda m: ';'.join(map(str, range(int(m.group(1)), int(m.group(2)) + 1))), compressed)

    # 'a{1;2}b{2;3;4}' -> ['a{1;2}', 'b{2;3;4}']
    elements = re.findall(r'[a-z]\{[\d;]+\}', normalized)

    tokens = []

    # ['a{1;2}', 'b{2;3;4}'] -> [['a1', 'a2'], ['b2', 'b3', 'b4']]
    for element in elements:
        match = re.match(r'([a-z])\{([\d;]+)\}', element)

        alphanumerics = []  # match result already guaranteed by re.findall()

        for number in match.group(2).split(';'):
            alphanumerics.append(match.group(1) + number)

        tokens.append(alphanumerics)

    # [['a1', 'a2'], ['b2', 'b3', 'b4']] -> 'a1b2b3b4a2b2b3b4'
    def pack_tokens(tokens):

        current, *rest = tokens

        if not rest:
            return ''.join(current)  # base case

        return ''.join(token + pack_tokens(rest) for token in current)

    return pack_tokens(tokens)

strings = ['a{1;4:6}', 'a{1;2}b{2:4}', 'a{1;2}b{2:4}c{3;6}']

for string in strings:
    print(string, '->', expand(string))

OUTPUT

a{1;4:6} -> a1a4a5a6
a{1;2}b{2:4} -> a1b2b3b4a2b2b3b4
a{1;2}b{2:4}c{3;6} -> a1b2c3c6b3c3c6b4c3c6a2b2c3c6b3c3c6b4c3c6
Comments