Kael Tee Kael Tee - 2 months ago 10
Python Question

regex to parse out certain value that i want

Using https://regex101.com/


  • MY current regex Expression:
    ^.*'(\d\s*.*)'*$



which doesnt seem to be working. What is the right combination formula that i should use?

I want to able to parse out 4 variable namely
items, quantity, cost and Total

MY CODE:

import re

str = "xxxxxxxxxxxxxxxxxx"
match = re.match(r"^.*'(\d\s*.*)'*$",str)
print match.group(1)

Answer

The following regex matches each ingredient string and stores wanted informations into groups: r'^(\d+)\s+([A-Za-z ]+)\s+(\d+(?:\.\d*))$'

It defines 3 groups each separated from other by spaces:

  • ^ marks the string start
  • (\d+) is the first group and looks for at least one digit
  • \s+ is the first separation between groups and looks for at least one white character
  • ([A-Za-z ]+) is the second group and looks for a least one alphabetical character or space
  • \s+ is the second separation beween groups and looks for at least one white character
  • (\d+(?:\.\d*) is the third group and looks for at least one digit with eventually a decimal point and some other digits
  • $ marks the string end

A regex to obtain the total does not need to be explained I think.

Here is a test code using your test data. Is should be a good starting point:

import re

TEST_DATA = ['Table: Waiter: kenny',
             '======================================',
             '1 SAUSAGE WRAPPED WITH B 10.00',
             '1 ESCARGOT WITH GARLIC H 12.00',
             '1 PAN SEARED FOIE GRAS 15.00',
             '1 SAUTE FIELD MUSHROOM W 9.00',
             '1 CRISPY CHICKEN WINGS 7.00',
             '1 ONION RINGS 6.00',
             '----------------------------------',
             'TOTAL 59.00',
             'CASH 59.00',
             'CHANGE 0.00',
             'Signature:__________________________',
             'Thank you & see you again soon!']

INGREDIENT_RE = re.compile(r'^(\d+)\s+([A-Za-z ]+)\s+(\d+(?:\.\d*))$')
TOTAL_RE = re.compile(r'^TOTAL (.+)$')

ingredients = []
total = None
for string in TEST_DATA:
    match = INGREDIENT_RE.match(string)
    if match:
        ingredients.append(match.groups())
        continue
    match = TOTAL_RE.match(string)
    if match:
        total = match.groups()[0]
        break

print(ingredients)
print(total)

this prints:

[('1', 'SAUSAGE WRAPPED WITH B', '10.00'), ('1', 'ESCARGOT WITH GARLIC H', '12.00'), ('1', 'PAN SEARED FOIE GRAS', '15.00'), ('1', 'SAUTE FIELD MUSHROOM W', '9.00'), ('1', 'CRISPY CHICKEN WINGS', '7.00'), ('1', 'ONION RINGS', '6.00')]
59.00

Edit on Python raw strings:

The r character before a Python string indicates that it is a raw string, which means that sp├ęcial characters (like \t, \n, etc...) are not interpreted.

To be clear, and for example, in a standard string \t is one tabulation character. It a raw string it is two characters: \ and t.

r'\t' is equivalent to '\\t'.

more details in the doc