max max - 1 month ago 5x
Git Question

Git HEAD referring to branch vs to commit

According to Pro Git Book:

In Git, HEAD is a pointer to the local branch you’re currently on.

This is consistent with Git for Computer Scientists:

The HEAD ref is special in that it actually points to another ref. It is a pointer to the currently active branch.

But it turns out that:

HEAD is not the latest revision, it's the current revision. Usually, it's the latest revision of the current branch, but it doesn't have to be.

For example:

If you check out something older (such as a tag like git checkout v1.1) then your HEAD changes to the commit of that tag. It may not be the latest commit.

So HEAD can point either to a branch or to a commit. Is the behavior of git commands any different when HEAD refers to branch X compared to when HEAD refers to the actual head commit of branch X? (In C like notation, I'm talking about the case when **HEAD refers to a certain commit vs. when *HEAD refer to the same commit.)


HEAD is just a reference, much like master or (if it exists) branch, but with two extra-special properties:

  1. HEAD is normally a symbolic reference (and normally, no other reference is symbolic, although you can make any symbolic reference you like using git symbolic-ref). A symbolic reference is just a name that contains another name, instead of a hash ID. When reading or writing a symbolic reference, Git will usually say "oh, well, this one is symbolic, so I'll just go read or write the other one now."

    Obviously this can result in an infinite loop: If reference a says "look at b" and b says "look at a", you can chase back and forth forever. But as long as you either don't do this, or make HEAD the only symbolic ref, you'll be fine, because you can't make HEAD point back to HEAD. Also, symbolic references don't work terribly well: if you make branch glorp point to master and then ask to delete glorp, Git deletes master instead! We'll see when this is actually a good thing in a moment.

  2. The literal string HEAD is built in to many Git commands, and the file itself is so important—used in so many places—that it's actually a test for whether a directory is itself a Git repository. This means that if something (such as a particularly untimely crash) wipes out your HEAD file, Git will stop believing that your .git directory is a repository! (No big deal, usually: just put the file back and all is well again.)

Whenever you make a new commit, the underlying process that Git uses is:1

  1. Read a commit ID from HEAD. This is the current commit: if you're in "detached HEAD" mode, with a raw commit ID in HEAD, that's what Git gets. If you're on a branch, so that HEAD contains the name of the branch, Git follows the indirection to the branch name and reads that, giving the tip-most commit of that branch. Either way, that's the current commit.

  2. Write out all the trees needed to make the commit (git write-tree), and write the new commit itself (git commit-tree) with its parent ID set to the ID obtained in step 1 (plus any additional parents needed if this is to be a merge commit), its tree set to the ID obtained here in step 2, and its commit message set to whatever is appropriate.

  3. Write the new commit's ID, as obtained from git commit-tree, into HEAD. If HEAD is symbolic—i.e., you're on a branch—this wites instead to the branch name. Now the branch name points to the new tip-most commit of the branch!

    But note that, in step 3, if you're in "detached HEAD" mode, Git still writes the new ID to HEAD. The result is that HEAD points to the tip of the new branch. In other words, "detached HEAD" mode just means that HEAD contains the ID of the tip of an anonymous branch. Adding new commits works exactly the same as always, updating the current branch. It's just that the current branch has only the name HEAD. (This is a name, it's just not the name of a branch. Specifically, all branch names start with refs/heads/. Since HEAD doesn't, it's not a branch name, it's just a reference. If a name starts with refs/remotes/ it's a remote-tracking branch name, and if it starts with refs/tags/ it's a tag, but HEAD doesn't start with anything at all, so it's just a reference.)

Your objection can be rephrased another way as well:

But that means many branches can all point to one commit ID!

Exactly. This is entirely normal, and it happens every time you make a new branch:2

...--o--o--o     <-- HEAD, master
          o      <-- branch

If HEAD is "detached" and we make a new commit:

             o   <-- HEAD
...--o--o--o     <-- master
          o      <-- branch

If HEAD is not detached—if, instead, it points to master—and we do git checkout -b newbr before we make the new commit, then we start with this instead (and this time I'll draw HEAD -> newbr to indicate that HEAD is symbolic and points to newbr):

...--o--o--o     <-- HEAD -> newbr, master
          o      <-- branch

and after the commit we have:

             o   <-- HEAD -> newbr
...--o--o--o     <-- master
          o      <-- branch

Note that in the "before" picture, we had three names for the current commit: HEAD, newbr, and master all pointed to it (though HEAD had to go through newbr first).

1That is, this is the process for a normal git commit. If you use git commit --amend, this process is modified just a bit: instead of reading the ID from HEAD, Git looks up the current commit's parent(s), and uses those IDs in step 3. This means the new commit, once made, has the same parents as the current commit. By writing the new commit's ID through HEAD into the branch, this appears to have changed a commit. But it hasn't, really: it's just shoved the "old current" commit aside.

If you work through an example with two or more branch names pointing to the same commit, you'll see exactly how and why using git commit --amend on a published commit—a commit you've pushed to another repository, and that other people now have by name—can be problematic. (Exercise/hint: How many branch-name references are changed in step 3, when updating HEAD?)

2Unless, that is, you use git checkout --orphan. What this does is put HEAD into the same special state it has in a new, empty repository: HEAD now contains the name of a branch that doesn't actually exist yet. That is, it's a symbolic reference to a nonexistent branch. The three-step commit sequence above knows how to deal with the failure to read an ID from HEAD: it makes a new commit with no parent, then writes the new ID into HEAD, which has the side effect of actually creating the branch.

This solves the bootstrap problem with the new, empty repository: a branch name can only point to an actual commit; but master, in a new empty repository, can't point to any commit at all, because there simply aren't any. So a new repository doesn't actually have a master branch until you make the first commit, even though HEAD is set up so that you're on the master branch.