susdu susdu - 10 months ago 43
Python Question

python: regex - catch variable number of groups

I have a string that looks like:

TABLE_ENTRY.0[hex_number]= <FIELD_1=hex_number, FIELD_2=hex_number..FIELD_X=hex>
TABLE_ENTRY.1[hex_number]= <FIELD_1=hex_number, FIELD_2=hex_number..FIELD_Y=hex>

number of fields is unknown and varies from entry to entry, I want to capture
each entry separately with all of its fields and their values.

I came up with:


which matches the table entry and the first field, but I dont know how to account for variable number of fields.

for input:

ENTRY_0[0x130]=0: <FIELD_0=0, FIELD_1=0x140... FIELD_2=0xff3>

output should be:


Answer Source

In short, it's impossible to do all of this in the re engine. You cannot generate more groups dynamically. It will all put it in one group. You should re-parse the results like so:

import re
input_str = ("TABLE_ENTRY.0[0x1234]= <FIELD_1=0x1234, FIELD_2=0x1234, FIELD_3=0x1234>\n"
             "TABLE_ENTRY.1[0x1235]= <FIELD_1=0x1235, FIELD_2=0x1235, FIELD_3=0x1235>")
results = {}
for match in re.finditer(r"([A-Z_0-9\.]+\[0x[0-9A-F]+\])=\s+<(.*)>", input_str):
    fields =", ")
    results[] = dict(f.split("=") for f in fields)

>>> results
{'TABLE_ENTRY.0[0x1234]': {'FIELD_2': '0x1234', 'FIELD_1': '0x1234', 'FIELD_3': '0x1234'}, 'TABLE_ENTRY.1[0x1235]': {'FIELD_2': '0x1235', 'FIELD_1': '0x1235', 'FIELD_3': '0x1235'}}

The output will just be a large dict consisting of a table entry, to a dict of it's fields.

It's also rather convinient as you may do this:

>>> results["TABLE_ENTRY.0[0x1234]"]["FIELD_2"]

I personally suggest stripping off "TABLE_ENTRY" as it's repetative but as you wish.