blizzle blizzle - 5 months ago 5
Python Question

Python Regular Expression: Matching Car Speeds without Highway Names

I'm trying to match speed descriptions of highway tickets, for example,text lines:

"L A 16-25MPH" should return 2 groups: 16, 25

"LMT ACC 6-10" should return 2 groups: 6, 10

"6 OVER" should return 1 group: 6


I'm OK with all of the above situations, but I run into issues for strings with numbers that aren't related to speed, for example:

"LIMITED ACCESS SPEED I-75" should return no matches.
The closest expression I can get to capture all needed is:

((?<!\w-)\d+)[^\d]*((?<!\w-)\d+)?
, which would match 1 group: 5, using the python regular expression engine

For now, it's safe to assume that a letter then a hyphen (
\w-
) is what I'm trying to use a negative lookback to exclude, I'm just not sure how to group more than one digit(
\d+
) to use the negative lookback.

Answer

Description

([0-9]+)(?:-([0-9]+)|\s*over)

Regular expression visualization

** To see the image better, simply right click the image and select view in new window

This regular expression will do the following:

  • Matches numbers related to speeds
  • avoid numbers that are part of road names

Example

Live Demo

https://regex101.com/r/hE5dL4/2

Sample text

Note: the edge case about I-75

'm trying to match speed descriptions of highway tickets, for example,text lines:

"L A 16-25MPH" should return 2 groups: 16, 25 
"LIMITED ACCESS SPEED I-75" should return no matches.
"LMT ACC 6-10" should return 2 groups: 6, 10 
"6 OVER" should return 1 group: 6

I'm OK with all of the above situations, but I run into issues for strings with numbers that aren't related to speed, for example:

"LIMITED ACCESS SPEED I-75" should return no matches.

Sample Matches

MATCH 1
1.  [89-90] `16`
2.  [91-93] `25`

MATCH 2
1.  [193-194]   `6`
2.  [195-197]   `10`

MATCH 3
1.  [231-232]   `6`

Explanation

NODE                     EXPLANATION
----------------------------------------------------------------------
  (                        group and capture to \1:
----------------------------------------------------------------------
    [0-9]+                   any character of: '0' to '9' (1 or more
                             times (matching the most amount
                             possible))
----------------------------------------------------------------------
  )                        end of \1
----------------------------------------------------------------------
  (?:                      group, but do not capture:
----------------------------------------------------------------------
    -                        '-'
----------------------------------------------------------------------
    (                        group and capture to \2:
----------------------------------------------------------------------
      [0-9]+                   any character of: '0' to '9' (1 or
                               more times (matching the most amount
                               possible))
----------------------------------------------------------------------
    )                        end of \2
----------------------------------------------------------------------
   |                        OR
----------------------------------------------------------------------
    \s*                      whitespace (\n, \r, \t, \f, and " ") (0
                             or more times (matching the most amount
                             possible))
----------------------------------------------------------------------
    over                     'over'
----------------------------------------------------------------------
  )                        end of grouping
----------------------------------------------------------------------