oligofren oligofren - 15 days ago 7
Git Question

Why does the file glob **/*.cs in git grep not show me all *.cs hits?

So I wanted to find the use of NLog in my project, and I employed git grep to do so for me, but it found a few more cases than I needed:

git grep NLog
GETA.Seo.Sitemap/Geta.SEO.Sitemaps.csproj: <Reference Include="NLog, Version=2.1.0.0, Culture=neutral, PublicKeyToken=5120e14c03d0593c, processorArchitecture=MSIL">
GETA.Seo.Sitemap/Geta.SEO.Sitemaps.csproj: <HintPath>..\packages\NLog.2.1.0\lib\net45\NLog.dll</HintPath>
GETA.Seo.Sitemap/Services/CloudinaryService.cs: NLogger.Exception("Could not transform image", exception);
GETA.Seo.Sitemap/Services/CloudinaryService.cs: NLogger.Warn("Url for cloudinary id was null");
GETA.Seo.Sitemap/Services/CloudinaryService.cs: NLogger.Warn("Could not locate file object for cloudinary id in EpiServer");
....
etc


Granted, it found what I was looking for, but I wanted to filter down to only the files ending in
.cs
. So I tried doing this:

git grep NLog **/*.cs
Web/Global.asax.cs: NLogger.Info("Meny application start");


Just one hit, and neither of the two matches I had above were listed. I found this peculiar, and I probably have misunderstood the globbing matching of git grep. Could someone enlighten me?

Answer

(Terminology note, for anyone reading this answer: expanding things like *.cs is called "globbing",1 with *.cs being a "shell glob". A "shell" is your command line interpreter, which can be sh, bash, zsh, dash, tcsh, and so on. Git will has its own built-in globbing. The expanded characters are called wildcards, and they include *, ?, and [. Some shells also treat { specially, which is an issue when using Git's reflog names like master@{yesterday} or stash@{2}. Quoting is always available for all of these.)

The problem in this particular case—it may or may not happen to other people, depending on which shell they use and their circumstances—is that an unprotected (unquoted) * undergoes shell globbing. Some shells, such as bash, will, or at least can, expand ** the same way that Git does, meaning "recurse into subdirectories". Others can't, or depending on settings, won't.2

If your shell expands **/*.cs to include the name Web/Global.asax.cs but not to include GETA.Seo.Sitemap/Services/CloudinaryService.cs (because that's down one more level of directory), then by the time Git gets the names, it's too late: the wildcard * characters are gone. Git never sees them and cannot do its own globbing.

The simple solution is to protect the wildcard characters from shell globbing, by quoting them:

git grep '**/*.cs'

(paired-up double quotes—as in git grep "**/*.cs"—also work in most shells, and prefix backslashes also work when used instead of quotes, as in git grep \*\*/\*.cs: just protect each vulnerable character with a backslash). For many Git commands—it's not as important with git grep unless you're grepping older commits—it's a good idea to protect all wildcard characters at all times, so that they pass through to Git, because Git will expand them against something other than the current work-tree. The shell sees only the work-tree.3)

Although it's shell-dependent, sometimes a wildcard character will match nothing and then be passed through. For instance, if you have no directory named sub and you write sub/*, some—not all—shells will pass the literal text sub/* to the command you ran.4 In this case, if the command is a Git command, it can once again do its own globbing. It's not wise to depend on this, since as soon as there is something to match, the shell does the matching, instead of passing the original wildcard character on to the program.


1The name "glob" is shortened from "global", and in very early shells, was done by an external program named glob. Early versions of Unix ran on machines with as little as 64 kilobytes of memory, so there was not a lot of room for fancy in-shell expansion. See https://en.wikipedia.org/wiki/Glob_(programming) for more.

2In bash, Git-style expansion is controlled by setting the variable globstar.

3This might even include the .git repository subdirectory itself, which is generally bad. In bash, this is controlled by the variable dotglob.

4In bash, this is controlled by failglob.

Note that bash provides nearly much every possible behavior of every possible shell. It's attempting to be a sort of universal shell. Of course, this means it needs all these control variables too, which makes bash quite big. You would never be able to run it on a 64K non-split-I&D PDP-11.

Comments