Pang Ho Ming Pang Ho Ming - 3 months ago 13
Python Question

Replacing all numeric value to formatted string

What I am trying to do is:

Find out all the numeric values in a string.

input_string = "高露潔光感白輕悅薄荷牙膏100 79.80"

numbers = re.finditer(r'[-+]?[0-9]*\.?[0-9]+(?:[eE][-+]?[0-9]+)?',input_string)

for number in numbers:
print ("{} start > {}, end > {}".format(number.group(), number.start(0), number.end(0)))

'''Output'''
>>100 start > 12, end > 15
>>79.80 start > 18, end > 23


And then I want to replace all the integer and float value to a certain format:

INT_(number of digit)
and
FLT(number of decimal places)


eg.
100 -> INT_3 // 79.80 -> FLT_2


Thus, the expect output string is like this:

"高露潔光感白輕悅薄荷牙膏INT_3 FLT2"


But the string replace substring method in Python is kind of weird, which can't archive what I want to do.

So I am trying to use the substring append substring methods

string[:number.start(0)] + "INT_%s"%len(number.group()) +.....


which looks stupid and most importantly I still can't make it work.

Can anyone give me some advice on this problem?

Answer

Use re.sub and a callback method inside where you can perform various manipulations on the match:

import re
def repl(match):
    chunks = match.group(1).split(".")
    if len(chunks) == 2:
        return "FLT_{}".format(len(chunks[1]))
    else:
        return "INT_{}".format(len(chunks[0]))

input_string = "高露潔光感白輕悅薄荷牙膏100   79.80"
result = re.sub(r'[-+]?([0-9]*\.?[0-9]+)(?:[eE][-+]?[0-9]+)?',repl,input_string)
print(result)

See the Python demo

Details:

  • The regex now has a capturing group over the number part (([0-9]*\.?[0-9]+)), this will be analyzed inside the repl method
  • Inside the repl method, Group 1 contents is split with . to see if we have a float/double, and if yes, we return the length of the fractional part, else, the length of the integer number.
Comments