Boandlkramer Boandlkramer - 9 months ago 68
Python Question

RegEx/Python: optional whitespace not found

got a really weird problem. My (Python) RegEx looks like this:


In a re.findall()-term, this should throw two matches in for the following text: "...from 71m² to 83m²"
However, only 83 is matched. The problem has something to do with the optional whitespace between the number (\s\d{1,3}[.,]?\d{1,2}?) and the squaremeters (?:m\u00B2|qm), for when I'm deleting the \s*, only 71 is matched as expected. I have no idea what is wrong with my regex.
Thanks for yout help!


Why don't you try using a positive lookahead? This will match 1+ digits (with an optional comma inside), as long as there is or qm after it. There is an optional space between the numbers and the unit:

>>> import re
>>> re.findall("\d{1,}(?=\s{0,1}[m\u00B2|qm])", "from 71m² to 83m²")
['71', '83']
>>> re.findall("\d{1,}(?=\s{0,1}[m\u00B2|qm])", "from 71,56 m² to 837,78 qm")
['56', '78']

It does not take into account the words you have specified, but you can easily add that part back in. However re.findall() returns non-overlapping results, so if you specify the start of the string in your search, it will only ever return the first value, as it effectively 'chops' out parts that it matches, therefore never finding the second part.