Vinay Sawant Vinay Sawant - 4 months ago 25
Python Question

I want to match money amount with regex for indian currency

I want to match amount like

Rs. 2000 , Rs.2000 , Rs 20,000.00 ,20,000 INR 200.25 INR.


The regular expression i have tried is this

(?:(?:(?:rs)|(?:inr))(?:!-{0,}|\.{1}|\ {0,}|\.{1}\ {0,}))(-?[\d,]+ (?:\.\d+)?)(?:[^/^-^X^x])|(?:(-?[\d,]+(?:\.\d+)?)(?:(?:\ {0,}rs)|(?:\ {0,}rs)|(?:\ {0,}(inr))))


But it is not matching numbers with
inr
or
rs
after the amount
I want to match it using re library in Python.

Jan Jan
Answer

Though slightly out of scope, here's a fingerplay with the newer and far superior regex module by Matthew Barnett (which has the ability of subroutines and branch resets):

import regex as re

rx = re.compile(r"""
(?(DEFINE)
    (?<amount>\d[\d.,]+)    # amount, starting with a digit
    (?<currency1>Rs\.?\ ?)  # Rs, Rs. or Rs with space
    (?<currency2>INR)       # just INR
)

(?|
    (?&currency1)
    (?P<money>(?&amount))
|
    (?P<money>(?&amount))
    (?=\ (?&currency2))
)

""", re.VERBOSE)

teststring = "Rs. 2000 , Rs.2000 , Rs 20,000.00 ,20,000 INR 200.25 INR."
prices = [m.group('money') for m in rx.finditer(teststring)]
print prices

# ['2000', '2000', '20,000.00', '20,000', '200.25']


This uses subroutines and a branch reset (thanks to @Wiktor!).
See a demo on regex101.com.