lyke lyke - 6 months ago 21
Bash Question

what meas this regex (.*\1)

i googled about regex for checking string more than twice in a line,
then i found this example:

egrep "(\w{2}).*\1" file

but i coudlnt understand "(\w{2}).*\1" this.

can someone explain me in detail or get me some reference web page??

Answer
  1. (\w{2}) matches any word character that includes all these: A-Z, a-z, 0-9 and underscore with a quantity of 2({2} quantifier), it also groups them as a captured group i.e remembers the matched characters and those characters can be referenced again using numbered backreferences, in this case \1
  2. .* matches 0 or more any chars
  3. \1 matches the 1st group again

Therefore the regex tries to match any 2 word characters that are repeated after 0 or more characters in the same line.

$ egrep "(\w{2}).*\1"
ab;;ab
ab;;ab
abcdab
abcdab
12ab12
12ab12
12abcd123
12abcd123
abab
abab
$

Inputs and matched output:

  1. ab;;ab captured group \1: ab and matched string is ab;;ab
  2. abcdab captured group \1: ab and matched string is abcdab
  3. 12ab12 captured group \1: ab and matched string is 12ab12
  4. 12abcd123 captured group \1: 12 and matched string is 12abcd12
  5. abab captured group \1: ab and matched string is abab

As pointed out more information on the meta/special characters can be found here

Comments