sarkon sarkon - 9 months ago 37
Ruby Question

regex strip all digits except ordinals

Looking for regex to use with #gsub in Ruby to strip all digits in a string except for ordinals. Assume the following is fine to preserve what I want in a string:

string = "100 red balloons"
strip_digits = string.gsub(/[^a-zA-Z\s]/, '')
=> " red balloons"

How would I go about modifying the regex in strip_digits such that if:

string = "50th red balloon"

strip_digits would return:

=> "50th red balloon"

That is, the regex would ignore digits that are part of ordinals, while matching them otherwise.

For this example, it's safe to assume that any string of digits immediately followed by an ordinal indicator ("nd", "th", "rd", or "st") is an ordinal.


Just as a "fix" of your regex, I suggest:

input.gsub(/(\d+(?:th|[rn]d|st))|[^a-z\s]/i, "\\1")

See IDEONE demo here

The logic is the following: match and capture into group 1 all the numbers followed with the ordinal numeral suffixes, and then restore this value with the \1 backreference in the replacement pattern, and then match (to remove) all the non-letters and non-spaces with [^a-z\s] (or [^\p{L}\s]).

Pattern details:

  • (\d+(?:th|[rn]d|st)) - Group 1 matching 1+ digits (\d+) followed with either th, rd, nd or st (all substring is stored in a numbered buffer #1, that is accessed when the \1 backreference is used in the replacement pattern)
  • | - or
  • [^a-z\s] - a character other than an ASCII letter (all lower- and uppercase letters are matched due to the /i case insensitive modifier) and a whitespace (to avoid removing Unicode letters, use \p{L} instead of a-z).