Arun Sangal Arun Sangal - 10 days ago 5
Linux Question

BASH =~ contains substring: "+" or a regex character behavior

Curious to know why the following is not working for character: +

Characters "\", "(" and "*" makes sense i.e. * will expand to folders/files in current directory (during command line shell expansion) and similarly \ and ( will expect closing character to work but my understanding was "+" should have worked like "-" did.

PS: I know putting double quotes i.e. "${o}" in the IF statement, will work for all characters in my test case below. Using \${o} in IF statement with or without double quote will fail all the checks.

$ for o in - + \` ~ \~ , _ = / \\ ! @ \# $ \$ % ^ \& \* \( \); do a="a${o}b${o}c";if [[ $a =~ ${o} ]]; then echo "${o} exists in $a and =~ works"; else echo -e "\ncharacter ${o} doesn't work with =~\n"; fi; done
- exists in a-b-c and =~ works

character + doesn't work with =~

` exists in a`b`c and =~ works
/home/ubuntu exists in a/home/ubuntub/home/ubuntuc and =~ works
~ exists in a~b~c and =~ works
, exists in a,b,c and =~ works
_ exists in a_b_c and =~ works
= exists in a=b=c and =~ works
/ exists in a/b/c and =~ works

character \ doesn't work with =~

! exists in a!b!c and =~ works
@ exists in a@b@c and =~ works
# exists in a#b#c and =~ works
$ exists in a$b$c and =~ works
$ exists in a$b$c and =~ works
% exists in a%b%c and =~ works
^ exists in a^b^c and =~ works
& exists in a&b&c and =~ works

character * doesn't work with =~


character ( doesn't work with =~

) exists in a)b)c and =~ works

Answer

The fundamental misunderstanding behind this question is that =~ is a substring-search operator. It is not.

The right-hand side of =~ is evaluated as a POSIX ERE expression. =~ is thus a regex-matching operator, which happens to be frequently used for searches when the right-hand side is quoted to make its contents literal (or when this string is known to match only itself when interpreted as an ERE).


+, in regex, means "1-or-more of the preceding token" -- just as * means "0-or-more of the preceding token".

Thus, either [[ $foo =~ + ]] or [[ $foo =~ * ]] makes no sense, because these are checking for zero-or-more of a preceding token that doesn't exist at all.

Similarly, ( and ) have meaning in ERE as the beginning and end of a match group, so when they're given bare (unescaped/unquoted), they result in an invalid regex.

If you quote the expansion, by contrast, all characters contained will be treated as literal, rather than being treated as regular expression metacharacters, thus resulting in the presumably-intended behavior.


If you want to check whether a literal character is contained in a string, either quote it -- [[ $foo =~ "$o" ]] -- or use a glob-style pattern: [[ $foo = *"$o"* ]]

Comments