Mpizos Dimitris Mpizos Dimitris - 1 month ago 4
Python Question

Using regEx to remove digits from string

I am trying to remove all digits from a string that are not attached to a word. Examples:

"python 3" => "python"
"python3" => "python3"
"1something" => "1something"
"2" => ""
"434" => ""
"python 35" => "python"
"1 " => ""
" 232" => ""


Till now I am using the following regular expression:

((?<=[ ])[0-9]+(?=[ ])|(?<=[ ])[0-9]+|^[0-9]$)


which can correctly do some of the examples above, but not all. Any help and some explanation?

Answer

Why not just use word boundaries?

\b\d+\b

Here is an example:

>>> import re
>>> words = ['python 3', 'python3', '1something', '2', '434', 'python 35', '1 ', ' 232']
>>> for word in words:
...     print("'{}' => '{}'".format(word, re.sub(r'\b\d+\b', '', word)))
...
'python 3' => 'python '
'python3' => 'python3'
'1something' => '1something'
'2' => ''
'434' => ''
'python 35' => 'python '
'1 ' => ' '
' 232' => ' '

Note that this will not remove spaces before and after. I would advise using strip(), but if not you can probably do \b\d+\b\s* (for space after) or something similar.

Comments