Vinay Sawant Vinay Sawant - 4 months ago 31
Python Question

I want to match money amount with regex for indian currency without commas

I want to match amount like

Rs. 2000 , Rs.2000 , Rs 20,000.00 ,20,000 INR 200.25 INR.


Output should be
2000,2000,20000.00,20000,200.25

The regular expression i have tried is this

(?:(?:(?:rs)|(?:inr))(?:!-{0,}|\.{1}|\ {0,}|\.{1}\ {0,}))(-?[\d,]+ (?:\.\d+)?)(?:[^/^-^X^x])|(?:(-?[\d,]+(?:\.\d+)?)(?:(?:\ {0,}rs)|(?:\ {0,}rs)|(?:\ {0,}(inr))))


But it is not matching numbers with
inr
or
rs
after the amount
I want to match it using re library in Python.

Answer

I suggest using alternation group with capture groups inside to only match the numbers before or after your constant string values:

(?:Rs\.?|INR)\s*(\d+(?:[.,]\d+)*)|(\d+(?:[.,]\d+)*)\s*(?:Rs\.?|INR)

See the regex demo.

Pattern explanation:

  • (?:Rs\.?|INR)\s*(\d+(?:[.,]\d+)*) - Branch 1:
    • (?:Rs\.?|INR) - matches Rs, Rs., or INR...
    • \s* - followed with 0+ whitespaces
    • (\d+(?:[.,]\d+)*) - Group 1: one or more digits followed with 0+ sequences of a comma or a dot followed with 1+ digits
  • | - or
  • (\d+(?:[.,]\d+)*)\s*(?=Rs\.?|INR) - Branch 2:
    • (\d+(?:[.,]\d+)*) - Group 2 capturing the same number as in Branch 1
    • \s* - zero or more whitespaces
    • (?:Rs\.?|INR) - followed with Rs, Rs. or INR.

Sample code:

import re
p = re.compile(r'(?:Rs\.?|INR)\s*(\d+(?:[.,]\d+)*)|(\d+(?:[.,]\d+)*)\s*(?:Rs\.?|INR)')
s = "Rs. 2000 , Rs.3000 , Rs 40,000.00 ,50,000 INR 600.25 INR"
print([x if x else y for x,y in p.findall(s)])

See the IDEONE demo

Alternatively, if you can use PyPi regex module, you may leverage branch reset construct (?|...|...) where capture group IDs are reset within each branch:

>>> import regex as re
>>> rx = re.compile(r'(?|(?:Rs\.?|INR)\s*(\d+(?:[.,]\d+)*)|(\d+(?:[.,]\d+)*)\s*(?:Rs\.?|INR))')
>>> prices = [match.group(1) for match in rx.finditer(teststring)]
>>> print(prices)
['2000', '2000', '20,000.00', '20,000', '200.25']

You can access the capture group in each branch by ID=1 (see match.group(1)).