Yonlif Yonlif -4 years ago 54
Python Question

how to turn all the numbers in string to the string "NUM" in Python

I need to tokenize a text with this string for example:

"hello 502world a0.0.3b .1.4 <sub>5</sub>"


I want to turn it to:
"hello NUMworld aNUMb NUM <sub>5</sub>"


Notice that 0.0.3 and .1.4 also turns into NUM in addition to 502 that turns into a NUM too, but inside of sub I want to keep the number the same.

The text have non-ASCII chars in it

Notice again that if the number is between sub
so it should stay number.

This is an example the text is from here.

Answer Source

The solution using re.sub function:

import re

s = "hello 502world a0.0.3b .1.4 <sub>5</sub>"
replaced = re.sub(r'(NUM){2,}', 'NUM', re.sub(r'(?<!<sub>)\.?\d+', 'NUM', s))

print(replaced)

The output:

hello NUMworld aNUMb NUM <sub>5</sub>
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download