posicore posicore - 7 months ago 17
SQL Question

Javascript regex to validate received search string for MySQL full-text search

I need to validate, if a entered search string used for a full-text search (boolean mode) on a MySQL database is valid.

The minimum word length (ft_min_word_len) has been set to "2" and only the following operators are allowed to be used: +, -, ", *, (, )

The following validation rules are given:


  • it may contain several valid words separated by a blank

  • valid words are containing at least two characters matching the following character set:
    [^+\s\-\>\<\(\)\~\*\:\"\&\|]



    • no whitespaces, no asterisk, no parentheses, no plus sign, no minus sign, no double quotes


  • words may have either a leading plus sign or a leading minus sign

  • words may have a trailing asterisk

  • a phrase may be enclosed by double quotes


    • plus and minus signs enclosed by double quotes won´t be interpreted as operators

    • the enclosed phrase may have either a leading plus sign or a leading minus sign


  • words may be enclosed by parentheses


    • the parenthesized words must not have any leading operators

    • the left parenthesis may have either a leading plus sign or a leading minus sign




Valid search string examples that must pass the regex:

'word1 word2 word3 word4 word5'
'+word1 +word2 word3 word4 -word5'
'+word1 +"word2 word3 word4" -word5'
'+word1 +"word2+word3 word4" -word5'
'+word1 +"word2*word3 word4" -word5'
'+word1 +(word2 word3 word4) -word5'
'+word1 +(word2 word3* word4*) -word5'


Invalid search string examples that must not pass the regex:

'w word2 word3 word4 word5'
'wo*rd2 wo+rd3 wor(d)4 "word5'
'word1+ word2+ word3 word4 -word5'
'+word1 +"word2 word3 word4 -word5'
'+word1 +(word2 word3 word4 -word5'
'+word1 +(+word2 -word3 -word4) -word5'


I created a regex which works pretty well, but it´s quite long and the same regex for the words is used several times, because words my be enclosed by quotes or parentheses:

/^((?:[+\-]?(?:(?:[^+\s\-\>\<\(\)\~\*\:\"\&\|]{2,}\*?|(?:"(?:[^"\s]{2,}[ ]*)+"))|\((?:(?:[^+\s\-\>\<\(\)\~\*\:\"\&\|]{2,}\*?|(?:"(?:[^"\s]{2,}[ ]*)+"))[ ]*)+\))(?:[ ]+|$))+)$/


You may test the regex at regex101.com: https://regex101.com/r/lA3vG4/4

I´m no regex expert, so I´d like to know, if there´s a more simple regex that works for Javascript.

EDIT: replaced single whitespace characters by \s, thanks to Rick James for this hint

EDIT2: update of reserved characters for MyISAM FULLTEXT. Thanks again to Rick James.

Answer

Consider simplifying by replacing whitespace tests with [[:space:]]. (I don't know if Javascript understands that notation.)