samz_manu samz_manu - 4 months ago 17
Ruby Question

Ruby parsing and regex

Picked up Ruby recently and have been fiddling around with it. I wanted to learn how to use regex or other Ruby tricks to check for certain words, whitespace characters, valid format etc in a given text line.

Let's say I have an order list that looks strictly like this in this format:

cost: 50 items: book,lamp

One space after semicolon, no space after each comma, no trailing whitespaces at the end and stuff like that.
How can I check for errors in this format using Ruby? This for example should fail my checks:

cost: 60 items:shoes,football

My goal was to split the string by a " " and check to see if the first word was "cost:", if the second word was a number and so on but I realized that splitting on a " " doesn't help me check for extra whitespaces as it just eats it up. Also doesn't help me check for trailing whitespaces. How do I go about doing this?


You could use the following regular expression.

r = /
    \A                # match beginning of string     
    cost:\s           # match "cost:" followed by a space
    \d+\s             # match > 0 digits followed by a space
    items:\s          # match "items:" followed by a space
    [[:alpha:]]+      # match > 0 lowercase or uppercase letters
    (?:,[[:alpha:]]+) # match a comma followed by > 0 lowercase or uppercase 
                      # letters in a non-capture group (?: ... )
    *                 # perform the match on non-capture group >= 0 times
    \z                # match the end of the string
    /x                # free-spacing regex definition mode

"cost: 50 items: book,lamp"         =~ r #=> 0   (a match, beginning at index 0)
"cost: 50 items: book,lamp,table"   =~ r #=> 0   (a match, beginning at index 0)
"cost:     60 items:shoes,football" =~ r #=> nil (no match)

The regex can can of course be written in the normal manner:

r = /\Acost:\s\d+\sitems:\s[[:alpha:]]+(?:,[[:alpha:]]+)*\z/


r = /\Acost: \d+ items: [[:alpha:]]+(?:,[[:alpha:]]+)*\z/

though a whitespace character (\s) cannot be replaced by a space in the free-spacing mode definition (\x).