Alexey Trofimov Alexey Trofimov - 1 year ago 85
Python Question

Python Regex: Proper way to extract separated numbers (AxBxC -> [A, B, C])

I am now trying to extract sizes from the string, which is a very common pattern i guess: AxBxC where A, B, C separated with x (may be x with spaces also), are the sizes (int or float):

import re

s = 'zzz 3062 0.2 aaa 15.8x20.2x12.2875 mm'


I am expecting to obtain onli three numbers: [15.8, 20.2, 12.2875]
The only working approach i have now is ugly:

r1 = re.findall('(\d+\.?\d*)\ *x\ *', s)
r2 = re.findall('\ *x\ *(\d+\.?\d*)', s)
r1.extend(r2)
print(set(r1))

{'15.8', '20.2', '12.2875'}


Is there any way to use single robust regexp for extraction these numbers?
Thanks.

Answer Source

It seems you need to match 2 or 3 x separated float values. You may use

r'(\d[\d.]*)x(\d[\d.]*)(?:x(\d[\d.]*))?'

See the regex demo

Details

  • (\d[\d.]*) - Group 1: a digit and then 0+ digits or/and .
  • x - a literal x
  • (\d[\d.]*) - Group 2: a digit and then 0+ digits or/and .
  • (?:x(\d[\d.]*))? - an optional sequence of x(\d[\d.]*), an x followed with Group 3 capturing a digit and then 0+ digits or/and ..

In Python, use

re.findall(r'(\d[\d.]*)x(\d[\d.]*)(?:x(\d[\d.]*))?', s)
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download