susdu susdu - 2 months ago 8
Python Question

python: regex - catch variable number of groups

I have a string that looks like:

TABLE_ENTRY.0[hex_number]= <FIELD_1=hex_number, FIELD_2=hex_number..FIELD_X=hex>
TABLE_ENTRY.1[hex_number]= <FIELD_1=hex_number, FIELD_2=hex_number..FIELD_Y=hex>

number of fields is unknown and varies from entry to entry, I want to capture
each entry separately with all of its fields and their values.

I came up with:


which matches the table entry and the first field, but I dont know how to account for variable number of fields.

for input:

ENTRY_0[0x130]=0: <FIELD_0=0, FIELD_1=0x140... FIELD_2=0xff3>

output should be:



In short, it's impossible to do all of this in the re engine. You cannot generate more groups dynamically. It will all put it in one group. You should re-parse the results like so:

import re
input_str = ("TABLE_ENTRY.0[0x1234]= <FIELD_1=0x1234, FIELD_2=0x1234, FIELD_3=0x1234>\n"
             "TABLE_ENTRY.1[0x1235]= <FIELD_1=0x1235, FIELD_2=0x1235, FIELD_3=0x1235>")
results = {}
for match in re.finditer(r"([A-Z_0-9\.]+\[0x[0-9A-F]+\])=\s+<(.*)>", input_str):
    fields =", ")
    results[] = dict(f.split("=") for f in fields)

>>> results
{'TABLE_ENTRY.0[0x1234]': {'FIELD_2': '0x1234', 'FIELD_1': '0x1234', 'FIELD_3': '0x1234'}, 'TABLE_ENTRY.1[0x1235]': {'FIELD_2': '0x1235', 'FIELD_1': '0x1235', 'FIELD_3': '0x1235'}}

The output will just be a large dict consisting of a table entry, to a dict of it's fields.

It's also rather convinient as you may do this:

>>> results["TABLE_ENTRY.0[0x1234]"]["FIELD_2"]

I personally suggest stripping off "TABLE_ENTRY" as it's repetative but as you wish.