Papajohn000 Papajohn000 - 3 months ago 19
Git Question

What flavor of regex does git use

I'm trying to use the git diff --word-diff-regex= command and it seems to reject any types of lookaheads and lookbehinds. I'm having trouble pinning down what flavor of regex git uses. For example

git diff --word-diff-regex='([.\w]+)(?!>)'

Comes back as an invalid regular expression.

I am trying to get all the words that are not HTML tags. So the resulting matches of the regex should be 'Hello' 'World' 'Foo' 'Bar' for the below string

<p> Hello World </p><p> Foo Bar </p>


The Git source uses regcomp and regexec, which are defined by POSIX 1003.2. The code to compile a diff regexp is:

            if (regcomp(ecbdata->diff_words->word_regex,
                        REG_EXTENDED | REG_NEWLINE))

which in POSIX means that these are "extended" regular expressions as defined here.

(Not every C library actually implements the same POSIX REG_EXTENDED. Git includes its own implementation, which can be built in place of the system's.)

Edit (per updated question): POSIX EREs have neither lookahead nor lookbehind, nor do they have \w (but [_[:alnum:]] is probably close enough for most purposes).