JokerMartini JokerMartini - 5 months ago 14
Python Question

recursively collect string blocks in python

I have a custom data file formatted like this:

data = {
friends = {
max = 0 0,
min = 0 0,
family = {
cars = {
van = "honda",
car = "ford",
bike = "trek",
presets = {
location = "italy",
size = 10,
travelers = False,
version = 1,

I want to collect the blocks of data, meaning string between each set of {} while maintaining a hierarhcy. This data is not a typical json format so that is not a possible solution.

My idea was to create a class object like so

class Block:
def __init__(self, header, children):
self.header = header
self.children = children

Where i would then loop through the data line by line 'somehow' collecting the necessary data so my resulting output would like something like this...

Block("data = {}", [
Block("friends = {max = 0 0,\n min = 0 0,}", []),
Block("family = {version = 1}", [...])

In short I'm looking for help on ways I can serialize this into useful data I can then easily manipulate. So my approach is to break into objects by using the {} as dividers.
If anyone has suggestions on ways to better approach this I'm all up for ideas. Thank you again.

So far I've just implemented the basic snippets of code

class Block:
def __init__(self, content, children):
self.content = content
self.children = children

def GetBlock(strArr=[]):
print len(strArr)
# blocks = []
blockStart = "{"
blockEnd = "}"

with open(filepath, 'r') as file:
data = file.readlines()
blocks = GetBlock(strArr=data)


You can create a to_block function that takes the lines from your file as an iterator and recursively creates a nested dictionary from those. (Of course you could also use a custom Block class, but I don't really see the benefit in doing so.)

def to_block(lines):
    block = {}
    for line in lines:
        if line.endswith("},"):
        key, value = map(str.strip, line.split(" = "))
        if value.endswith("{"):
            value = to_block(lines)
        block[key] = value
    return block

When calling it, you have to strip the first and last lines, though. Also, evaluating the "leafs" to e.g. numbers or strings is left as an excercise to the reader.

>>> to_block(iter(data.splitlines()[1:-1]))
{'data': {'family': {'version': '1,', 
                     'cars': {'bike': '"trek",', 'car': '"ford",', 'van': '"honda",'}, 
                     'presets': {'travelers': 'False,', 'size': '10,', 'location': '"italy",'}}, 
          'friends': {'max': '0 0,', 'min': '0 0,'}}}

Alternatively, you can do some preprocessing to transform that string into a JSON(-ish) string and then use json.loads. However, I would not go all the way here but instead just wrap the values into "" (and replace the original " with ' before that), otherwise there is too much risk to accidentally turning a string with spaces into a list or similar. You can sort those out once you've created the JSON data.

>>> data = data.replace('"', "'")
>>> data = re.sub(r'= (.+),$',     r'= "\1",', data, flags=re.M)
>>> data = re.sub(r'^\s*(\w+) = ', r'"\1": ',  data, flags=re.M)
>>> data = re.sub(r',$\s*}',       r'}',       data, flags=re.M)
>>> json.loads(data)
{'data': {'family': {'version': '1', 
                     'presets': {'size': '10', 'travelers': 'False', 'location': "'italy'"}, 
                     'cars': {'bike': "'trek'", 'van': "'honda'", 'car': "'ford'"}}, 
          'friends': {'max': '0 0', 'min': '0 0'}}}