Alex Donovan Alex Donovan - 3 months ago 18
Java Question

Extracting dimension measurments using Regular expression

I am struggling with regular expression that can extract metrics-like values from statements. Below are some sample I encountered:

Sample questions:


  1. Image pixel 200x500 px blur - extract 200x500 px

  2. Image pixel 200 x 500 blurring - extract 200 x 500

  3. 100.22 x 200.55 x 90.55 mm is the size of the handphone - extract 100.22 x 200.55 x 90.55 mm

  4. The mobile phone is 100.22x200.55x90.55 mm in dimension. - extract 100.22x200.55x90.55 mm



So far my code as follows



String str_array[] = new String[4];
str_array[0] = "Image pixel 200x500 px blur";
str_array[1] = "Image pixel 200 x 500 blurring";
str_array[2] = "100.22 x 200.55 x 90.55 mm is the size of the handphone";
str_array[3] = "The mobile phone is 100.22x200.55x90.55 mm in dimension.";
for (int i=0;i<str_array.length;i++){
Pattern pty_resolution_ratio_metrics_try = Pattern.compile("(\\d+)[\\.\\d]+(\\s*)x");
Matcher matcher_value_metrics_error_try = pty_resolution_ratio_metrics_try.matcher(str_array[i]);
while (matcher_value_metrics_error_try.find()) {
System.out.println("index: "+i+"-"+matcher_value_metrics_error_try.group(0));
}
}





The results from the above codes:


  • index: 0-200x

  • index: 1-200 x

  • index: 2-100.22 x

  • index: 2-200.55 x

  • index: 3-100.22x

  • index: 3-200.55x



Any regular expression suggestions? Need help on this.

Thanks!

Answer

You can this regex:

((?:\\d[\\d\\s\\.x]+\\d)(?:\\s*(?:px|mm))?)

This regex finds all digits, spaces, periods and x's inside 2 digits. And checks for px or mm following the numbers.

Alternately you could use a regex that checks to make sure everything is in the right order (no spaces between numbers):

((?:(?:[\\d\\.]+)(?:\\s*x\\s*(?:[\\d\\.]+))+)(?:\\s*(?:px|mm))?)
public static void main(String[] args) {

    String texts[] = {"Image pixel 200x500 px blur",
        "Image pixel 200 x 500 blurring",
        "100.22 x 200.55 x 90.55 mm is the size of the handphone",
        "The mobile phone is 100.22x200.55x90.55 mm in dimension"};

    String regex = "((?:\\d[\\d\\s\\.x]+\\d)(?:\\s*(?:px|mm))?)";

    Pattern p = Pattern.compile(regex);

    for (int q = 0; q < texts.length; q++){
        Matcher m = p.matcher(texts[q]);
        while (m.find()){
            System.out.println(m.group());
        } 
    }
}

Prints out the following:

200x500 px
200 x 500
100.22 x 200.55 x 90.55 mm
100.22x200.55x90.55 mm