mengg mengg - 2 months ago 17
Python Question

Parse arithmetic string with regular expression

I need to parse an arithmetic string with only times (

*
) and add (
+
), e.g.,
300+10*51+20+2*21
, use regular expressions.

I have the working code below:

import re


input_str = '300+10*51+20+2*21'

#input_str = '1*2+3*4'


prod_re = re.compile(r"(\d+)\*(\d+)")
sum_re = re.compile(r"(\d+)\+?")

result = 0
index = 0
while (index <= len(input_str)-1):
#-----
prod_match = prod_re.match(input_str, index)
if prod_match:
# print 'find prod', prod_match.groups()
result += int(prod_match.group(1))*int(prod_match.group(2))
index += len(prod_match.group(0))+1
continue
#-----
sum_match = sum_re.match(input_str, index)
if sum_match:
# print 'find sum', sum_match.groups()
result += int(sum_match.group(1))
index += len(sum_match.group(0))
continue
#-----
if (not prod_match) and (not sum_match):
print 'None match, check input string'
break


print result


I am wondering if there is a way to avoid creating the variable
index
above?

Answer

The algorithm seems not correct. An input of 1*2+3*4 does not yield a correct result. It seems wrong that after resolving one multiplication you continue to resolve an addition, while in some cases you would have to first resolve more multiplications before doing any additions.

With some change in the regular expressions and loops, you can achieve what you want as follows:

import re

input_str = '3+1*2+3*4'

# match terms, which may include multiplications
sum_re = re.compile(r"(\d+(?:\*\d+)*)(?:\+|$)")
# match factors, which can only be numbers 
prod_re = re.compile(r"\d+")

result = 0
# find terms
for sum_match in sum_re.findall(input_str):
    # for each term, determine its value by applying the multiplications
    product = 1
    for prod_match in prod_re.findall(sum_match):
        product *= int(prod_match)
    # add the term's value to the result
    result += product

print (result)