rich22 rich22 - 4 months ago 12
Git Question

Explaining cat .git/config ouput for tracking remote branches?

I'm learning in my tutorial about git and it recently gave out the output for cat .git/config for tracking remote branches as seen below. I understand that branch "master" is the master branch and origin refers to the remote tracking branch in the local computer but can someone explain what the fetch, merge and remote options are (I understand the rest)?

$ cat .git/config
[core]
repositoryformatversion = 0
filemode = false
bare = false
logallrefupdates = true
symlinks = false
ignorecase = true
[remote "origin"]
url = https://github.com/rich44/explore_california.git
fetch = +refs/heads/*:refs/remotes/origin/*
[branch "master"]
remote = origin
merge = refs/heads/master

Answer

Technically, each of these is just a section with variables, so it's not really true that, e.g., [branch "master"] is the master branch, it's just a pair of settings that can be spelled out as:

branch.master.remote origin
branch.master.merge refs/heads/master

Your question effectively works out to some number of parts, which I'll try to take in a sensible order, although Git being what it is, sometimes no order is sensible. :-)

Configuration sections

Git likes1 to talk about "sections" in the configuration. Specifically, the git config command has --rename-section and --remove-section. A section is basically just the part inside square brackets, which is a syntax stolen originally from INI file format.

Beyond this, however, Git internally doesn't actually care about sections. Each program simply queries with either a fixed string, such as core.bare or branch.master.remote, or using a regular expression or similar, such as core\..*, which matches everything in the [core] section. Here the backslash is required to protect the first dot . character; the second . character means "match anything" and the asterisk *—called a Kleene star in informatics theory—means "repeat it zero or more times". Hence this matches any string starting with core. and continuing on for zero or more characters. (Most of the C code inside Git that matches items in sections is considerably cruder than this; only git config itself and various scripts that run git config really use the full power of regular expressions.)

The ability to scan for matches allows Git to, for instance, find all remotes, which are simply all the names in each [remote "..."] section. (The double quotes here evolved because INI-file syntax forbids certain characters that are allowed in remote names and branch names and so on. Fortunately, remote and branch names cannot contain double quotes: if they could, we would have to wonder how these would be encoded.)


1I realize I'm anthropomorphizing Git here. It's a useful metaphor. But remember, don't anthropomorphize computers: they hate that!


Items in a remote "..." section

Because INI syntax is so flexible, you can put anything you like into this section. Things Git does not know or care about, Git simply ignores. For instance, you can edit your config to contain:

[remote "origin"]
    abracadabra = magic word
    hello = kitty

and these two settings will be ignored. So it's more interesting to find out what items Git will actually pay any attention to. (Any list we make is necessarily incomplete, because Git grows new "items to pay attention to" over time: at one point, Git looked for url but not for pushurl. At a previous $job, many years ago, I added code to our Git scripts to check for a pushurl. Then a new version of Git, 1.6.4, came out, that used pushurl in precisely the same way I had used it. [Clearly we had the same idea.] If we choose our new items well, we'll probably end up agreeing with the Git folks as to their meanings, as happened here, but it's always a bit of a risk, adding new things.)

Nonetheless, here's a partial list, mostly copied from the git config documentation:

  • url: The default place to fetch from, and usually, to push to.
  • pushurl: If set, the default place to push to.
  • mirror: If set, make pushes to this remote use --mirror by default.
  • proxy: Provide a proxy setting for libcurl.
  • fetch: Provide a default refspec for git fetch (may be repeated).
  • push: Provide a default refspec for git push (may be repeated).
  • prune: If set, make fetch from this remote use --prune by default.

The one you specifically asked about—the fetch line—supplies refspecs for git fetch. See below for more about refspecs.

Items in a branch "..." section

Again, there is quite a long list of things you can set that Git does actually pay attention to. These are mostly, though perhaps never completely, documented in that same git config documentation.

  • remote: The default remote to use when fetching and pushing.
  • merge: The name of the branch as seen on the remote Git, to use when fetching and pushing. This gets a bit complicated in complicated cases: see the description of refspecs below.
  • rebase: A setting (default false, other options include true and preserve) for how git pull should run its second half (remember that git pull is essentially shorthand for "first run git fetch, then run another, second, Git command").
  • description: A short descriptive string inserted into format-patch cover letters and pull requests.

A thing to know about branch names

Git originally had branches and tags. To keep them separate, Git stored branches in refs/heads/ and tags in refs/tags/. These were—and in fact, still are—just directories within .git, although today there are also "packed references" stored in .git/packed-refs, which saves time and disk space if you have thousands of rarely-updated branches and never-updated tags.

What this means, though, is that when you see a branch name like master, Git actually sees refs/heads/master. This is not only where Git stores the hash ID for the branch, it's also how Git knows that it is a branch in the first place. You type in master; Git searches in .git/refs/ and .git/packed-refs, and comes up with refs/heads/master; and the refs/heads/ part tells Git: aha, this is a branch!

When Git acquired remote-tracking branches, they were easy to add, because of this little bit of planning ahead. A remote-tracking branch is simply a name stored in refs/remotes/. To this prefix, Git adds the name of the remote itself, so that origin/master goes in refs/remotes/origin/master. This full name is what tells Git that the name is in fact a remote-tracking branch in the first place.

refspecs

To really understand refspecs, it helps to know a bit of Git history. Back in the Dim Time, before Git had remotes and remote-tracking branches, people had a collection of various kludges to deal with pushing and pulling changes from other Git repositories. There was a git fetch then, but it was a difficult-to-use "plumbing" program, meant mainly for scripts to use. It dumped its results into a file called FETCH_HEAD. The front end "nice" command was git pull. This is why Git has push and pull as the obvious—but wrong!—opposites, and why people are usually introduced to transferring data with push and pull instead of using push and fetch, which is actually the better way to go at first. The nice version of fetch did not exist yet. Today's git fetch still dumps its results into FETCH_HEAD, but also behaves better.

Because remote-tracking branches did not exist, refspecs did not really need to exist then either. If you were fetching from Bob's repository, you had a nicer front end script that ran git fetch. The fetch reached into Bob's computer somehow (via ssh or http or https or whatever, just as is still done today) and extracted his branches and tags and dropped everything into your FETCH_HEAD file. Your front-end script then extracted whatever looked interesting and let you merge, or rebase, or whatever you intended to do.

Note that during this process, you don't have to care one bit how Bob names his branches. Your front end script just looks into FETCH_HEAD. That file is completely separate from your branches, and you—or your front end script—can throw away Bob's branch names as soon as you are done looking.

Your nice front-end pull script, of course, does need to know Bob's name for your branch. Let's say you call your branch paris, while Bob calls his asteroid_433. You always want to merge his into yours, so you just configure your branch.paris.merge to asteroid_433.

This whole process was pretty messy. Someone came along and invented remote-tracking branches, which really is quite a brilliant idea. Instead of having to remember that Bob calls this asteroid_433, why not just get everything from Bob? For each name you get, if it's a branch—if it starts with refs/heads/—just drop it into your repository under the name refs/remotes/Bob/whatever. Now you can easily see all of Bob's branches any time. You'll still need to remember the mapping ("my paris = Bob's asteroid_433") but most of the time you two will probably use the same name.

Imagine you're the guy inventing this wonderful new feature. Of course, the pull script already exists. You can't change branch.paris.merge: it's still going to say refs/heads/asteroid_433, which is the name on Bob's computer. And, you're not sure if this is how you want to do it. Maybe you'd like to have git fetch grab Bob's asteroid_433 and rename it to paris, so that you get refs/remotes/Bob/paris as the remote-tracking branch.

Enter the refspec.2 The refspec is, in this case, simply a pair of names separated by a colon, and optionally prefixed with a plus sign. The name on the left is the "source" and the name on the right is the "destination". The source is Bob's names for his branches, and the destination is your remote-tracking branches for your remote named bob.

To make this work nicely, you put in pattern matching. For whatever reason, you don't use regular expressions, but rather use shell style glob expressions. (In versions of Git before 2.9 or so these are even further limited. They work well enough though.) You still have the name on the left, refs/heads/*, and the name on the right, refs/remotes/bob/*, but now the left side * means "match anything", and the right side * means "replace this with what you matched on the left".

This produces your remote-tracking branches, which git fetch now updates. To store your new refspecs, you add fetch configuration entries to the [remote "bob"] section.

To allow for multiple renamings, you make sure Git reads all the fetch = lines, so that someone can write:

[remote "bob"]
    fetch = +refs/heads/asteroid_433:refs/remotes/bob/paris
    fetch = +refs/heads/master:refs/remotes/bob/master

and so on. But in practice, most remotes ended up with just one fetch line.

(Allowing multiple lines is good forethought, as it turns out that today, we sometimes want to add refspecs beginning with +refs/notes/... so that we can bring over Git notes. You can't know this yet, but obviously you're pretty smart, coming up with remote-tracking branches in the first place. :-) )

Of course, the old branch.paris.merge syntax has to stick around around, because you can't change existing Git users' configurations. So now Git must, when it goes to use this merge value, map the value through the same fetch refspecs in order to figure out the correct remote-tracking branch name. (The old pull script didn't bother—it just got the value from FETCH_HEAD directly. That script has somewhat recently been rewritten as a C program, and it is no longer obvious what it does. The merge and rebase commands, when run standalone, do in fact do this mapping, as they must.)

I should mention the leading + here as well. This is simply the force flag for the given refspec. It's the equivalent of git push --force: it means "this reference should be updated no matter what", as compared to the more usual rule for branch name updates, which are allowed if the update adds new commits, but rejected if the update would "lose" existing commits (e.g., if you're picking up an upstream git reset of some sort). Normally every fetch = refspec for each remote has the leading +, since you always want all your remote-tracking branches updated to whatever commit their branch says is the correct one right now.


2Actually, it already existed, for git push purposes, because git push always needed it. But it got refined somewhat.