shrimpdrake shrimpdrake - 1 year ago 40
PHP Question

Building a complex regex with "conditions"

I'm trying to build a complex regex with the following constraints:

1. My string can only be composed of:

"Regular" alphanumeric characters : a-zA-Z0-9

4 specials characters : space . _ -

2. Length has to be between 3 and 25

So far it's quite easy but then it gets complicated :

3. There cannot be 2 consecutive special characters, unless the 1st one is a space and the 2nd one isn't a space. Logical consequence : there cannot be 3 consecutive special characters

4 The string cannot start or end with a space

I'm especially struggling with 3.
Any help/hint would be much appreciated.


" lkjsdi1SD" => FALSE (starts with a space)
"-lkjsdi1SD" => TRUE
"lkjsd -i1SD " => FALSE (ends with a space)
".Dg5 -lkjsdi1SD" => TRUE
"jhv5675gjjvghHJHvg655775vfFVHFJFf445576JHFFfhd12" => FALSE (too long)
"jhv 12" => FALSE (two consecutive spaces)
"as" => FALSE (too short)
"a r" => TRUE

Answer Source

I suggest using:

^                       # Start of string
 (?=.{3,25}$)           # The total string length is from 3 to 25
 [._-]?                 # An optional . _ or - (? means "match 1 or 0 times")
 [a-zA-Z0-9]+           # one or more alphanumeric symbols
 (?:                    # Zero or more sequences of:
    (?:[._-]|[ ][._-]?)   # one . _ or - OR a space followed with an optional . _ or -
    [a-zA-Z0-9]+          # one or more alphanumerics
 )*                     # (here * defines zero or more times)
 [._-]?                 # one optional . _ or -  
$                       # End of string

See the inline description for each part (I used /x VERBOSE (or free-space) modifier to enable comments that is helpful to keep long patterns readable).

See the regex demo

More pattern details

  • ^ - start of string anchor, the regex engine will only look for the whole pattern at the string start. Thus, if there is a space at the start, no match will be returned as [a-zA-Z0-9]+, the first obligatory subpattern, requires an alphanumeric, and [._-]? (a character class that matches one or zero ., _, or - (the ? is a quantifier matching one or zero occurrences of the quantified subpattern) only allows 1 of these 3 characters before the first alphanumeric.
  • (?=.{3,25}$) is a positive lookahead anchored at the start, that requires at least 3 and at most 25 characters other than a newline (. matches any char other than a LF if /s modifier is not defined) from start till end ($ is the string end anchor that matches at the end of string or before the final char that is a newline character, replace with \z if you want to disallow matching a string with a newline symbol at the end). The {3,25} is a limiting quantifier that allows matching min to max amount of characters conforming to the subpattern quantified. Note that a lookahead does not consume the text, i.e. the regex engine returns to the place where it starts matching the lookahead pattern with the true or false result, and if true, goes on matching the rest of the pattern.
  • [._-]? - an optional single char, one of the defined chars in the character class (see explanation above)
  • [a-zA-Z0-9]+ - one or more (I wrote "1+") characters (the + quantifier matches 1 or more occurrences) that are in the ranges defined in the character class.
  • (?:(?:[._-]|[ ][._-]?)[a-zA-Z0-9]+)* - is a non-capturing group used only for grouping subpatterns (to match them consecutively) that can match one or more (as the * stands after it) sequences of (?:[._-]|[ ][._-]?)[a-zA-Z0-9]+:
  • (?:[._-]|[ ][._-]?) - either a ., _, or -, OR (due to the | alternation operator) the space (I put the space into a character class [ ] because I used the /x VERBOSE modifier to introduce newline formatting and comments into the pattern, you may use a regular space if you do not use the /x modifier) followed with ., _, or -.
  • [a-zA-Z0-9]+ - 1 or more (due to +) alphanumerics.