Nylithius Nylithius - 3 months ago 9
JSON Question

Python recursive function or cycle to convert string to json logical object

I have this function:

def req_splitter(req_string):
req = {}
if " AND " in req_string:
cond = "AND"
req_splitted = req_string.split(" AND ")
elif " OR " in req_string:
cond = "OR"
req_splitted = req_string.split(" OR ")
else:
cond = "AND"
req_splitted = [req_string]

if len(req_splitted) > 1:
for sub_req in req_splitted:
sub_req_splitted = req_splitter(sub_req)
req[cond] = list()#new_req
req[cond].append(sub_req_splitted)
else:
req[cond] = req_splitted
return req


It is intended to convert to json-logic conditions a strings like this one:

Barracks AND Tech Lab
Lair OR Hive
Hatchery OR Lair OR Hive
Cybernetics Core AND Gateway OR Warpgate
Forge AND Twilight Council AND Ground Armor 1
Spire OR Greater Spire AND Hive AND Flyer Flyer Carapace 2
Spire OR Greater Spire AND Lair OR Hive AND Flyer Attacks 1


json_logic condition is looks like this:

{
"and": [
{
"or": [
"Gateway",
"Warpgate"
]
},
"Cybernetics Core"
]
}


How my recursive function should work to help me to split the string to the condition object like the example above?




To help you understand the problem:

json_logic is a module that checks the condition, like the dictionary you see above and returns some result, depending with what you compare it.

And how the condition works:
key-value par is one single logical statement.
The key stands for logical condition.
And the values in list is stands for operands.
if a value itself not a list but a dictionary, it recurses.

You can compare it to "polish notation"

And last thing - AND statements has more priority than OR statements, and OR statements are always together.

Answer

You'll need to write a simple top-down parser. The inimitable effbot wrote a great tutorial about just that kind of thing.

Tokenizing is just splitting on the r'\s+(OR|AND)\s+' regex, then recognising OR and AND as operators, the rest as literals. The AND and OR token .led() methods could flatten out directly nested operators of the same type.

I've implemented what is described there using a little more OOP (and not globals), and made it Python 2 and 3 compatible:

import re
from functools import partial


class OpAndToken(object):
    lbp = 10
    op = 'and'
    def led(self, parser, left):
        right = parser.expression(self.lbp)
        operands = []
        for operand in left, right:
            # flatten out nested operands of the same type
            if isinstance(operand, dict) and self.op in operand:
                operands.extend(operand[self.op])
            else:
                operands.append(operand)
        return {self.op: operands}


class OpOrToken(OpAndToken):
    lbp = 20
    op = 'or'


class LiteralToken(object):
    def __init__(self, value):
        self.value = value
    def nud(self):
        return self.value


class EndToken(object):
    lbp = 0


class Parser(object):
    operators = {'AND': OpAndToken, 'OR': OpOrToken}
    token_pat = re.compile("\s+(AND|OR)\s+")

    def __init__(self, program):
        self.program = program
        self.tokens = self.tokenizer()
        self.token = next(self.tokens)

    def expression(self, rbp=0):
        t = self.token
        self.token = next(self.tokens)
        left = t.nud()
        while rbp < self.token.lbp:
            t = self.token
            self.token = next(self.tokens)
            left = t.led(self, left)
        return left

    def tokenizer(self):
        for tok in self.token_pat.split(self.program):
            if tok in self.operators:
                yield self.operators[tok]()
            else:
                yield LiteralToken(tok)
        yield EndToken()

    def parse(self):
        return self.expression()

This parses your format into the expected output:

>>> Parser('foo AND bar OR spam AND eggs').parse()
{'and': ['foo', {'or': ['bar', 'spam']}, 'eggs']}

Demo on your input lines:

>>> from pprint import pprint
>>> tests = '''\
... Barracks AND Tech Lab
... Lair OR Hive
... Hatchery OR Lair OR Hive
... Cybernetics Core AND Gateway OR Warpgate
... Forge AND Twilight Council AND Ground Armor 1
... Spire OR Greater Spire AND Hive AND Flyer Flyer Carapace 2
... Spire OR Greater Spire AND Lair OR Hive AND Flyer Attacks 1
... '''.splitlines()
>>> for test in tests:
...     pprint(Parser(test).parse())
...
{'and': ['Barracks', 'Tech Lab']}
{'or': ['Lair', 'Hive']}
{'or': ['Hatchery', 'Lair', 'Hive']}
{'and': ['Cybernetics Core', {'or': ['Gateway', 'Warpgate']}]}
{'and': ['Forge', 'Twilight Council', 'Ground Armor 1']}
{'and': [{'or': ['Spire', 'Greater Spire']}, 'Hive', 'Flyer Flyer Carapace 2']}
{'and': [{'or': ['Spire', 'Greater Spire']},
         {'or': ['Lair', 'Hive']},
         'Flyer Attacks 1']}

Note that for multiple OR or AND operators in a row the operands are combined.

I'll leave adding support for (...) parentheses to the reader; the tutorial shows you how to do this (just make the advance() function a method on the Parser class and pass the parser in to .nud() calls too, or pass in the parser when creating token class instances).