I was wondering how all the tools know what branches/commits are merged until I found a "Merge" header in the commit.
My question is: why a
git merge --squash
git merge --squash
Merge: 0123456 9876543
Author: Some Body <...>
Date: Fri ....
git merge --squash
git merge --squashmerges (verb), but does not make a merge (noun).
The "merge header" you mention is not in the commits in that form. Instead, it's just something that
git log prints when it comes across a merge commit. To really cement this idea, let's look at what merging does—what "to merge" means as a verb—and what a merge is. More properly, let's look at what a merge commit is, or what "merge" means as an adjective modifying the word "commit". We'll be lazy, like good Git users, though, and shorten it from "merge commit" to "a merge", making it a noun.
Let's look at the noun usage first, since that's simpler. In Git, a merge commit is simply a commit with at least two parent commits. It really is that simple. Every commit has some number of parents. A root commit has zero parents (and most repositories have only one root commit, which is the first commit ever made, which obviously cannot have had a parent commit). Most ordinary commits have just one parent, and all the other commits have two or more parents1 and are therefore merge commits. Then we shorten the phrase "merge commit" to just "a merge", changing "merge" from an adjective (modifying "commit") to a noun (meaning "a merge commit").
If a merge commit is a commit with two parents, then
git merge must make merge commits, and to merge should mean "to make a merge commit"—and it does, but only sometimes. We will see when—and more importantly, why—soon.
1Other VCSes may stop at two. Mercurial does, for instance: no hg commit ever has more than two parents. Git, however, allows any commit to have any number of parents. A Git commit with three or more parents is called an octopus merge, and normally these commits are made using
git merge -s octopus or equivalent, e.g.,
git merge topic1 topic2 topic3: running
git merge with extra commit specifiers implies
-s octopus. These do nothing that cannot be done using a series of two-parent merges, so Git's octopus merges do not give it more power than Mercurial, but octopus merges are sometimes convenient, and good for showing off your Git Fu. :-)
The verb form of to merge is much more complicated—or at least, it is in Git. We can distinguish between two major forms of merging as well: there's the act of merging source code changes, and then there's the act of merging branches. (This brings in the question of "what is a branch". There's a lot more to this question than you might think, so see What exactly do we mean by "branch"?) Many Git commands can do the verb kind of merging, including
git cherry-pick and
git revert, both of which are essentially a form of
git apply -3.2 Of course
git merge is the most obvious way to do it, and when it does, you get the verb form of merging changes.
Merging source changes is more properly called "three way merging". For more about this, see the Wikipedia article3 or VonC's answer to Why is a 3-way merge advantageous over a 2-way merge? The details can get quite complex, but the goal of this merge process is simple enough: we want to combine changes made to some common base, i.e., given some starting point B, we find changes C1 and C2 and combine them to make a single new change C3, then add that change to the base B to get a new version.
In Git, the act of merging sources—of doing the three-way merge itself—uses Git's index, also called the "staging area". Normally, the index has just one entry for each file that will go into the next commit. When you
git add a file, you tell Git to update the staged version of the file, replacing the one in the current index, or if the file was previously "untracked", adding the file to the index so that it is now tracked and also staged for the next commit. During the merge process, however, the index has up to three entries for each file: one from the merge base version, one entry for the file-version to be treated as change #1 (also called "ours"), and one for change #2 ("theirs"). If Git is able to combine the changes on its own, it replaces4 the three entries with one regular (staged-for-commit) entry. Otherwise it stops with a conflict, leaving the conflict marked in the work-tree version of the file. You must resolve the conflict yourself (presumably editing the file in the process), and use
git add to replace the three special conflicted-merge index versions with the one normal staged version.
Once all conflicts are resolved, Git will be able to make a new commit.
The last thing a normal, merge-making
git merge does is to make the merge commit. Again, if there were no conflicts, Git can just do this on its own:
git merge merges the changes,
git adds each merge-result to update the staged files, and runs
git commit for you. Or, if there were conflicts, you fix them, you run
git add, and you run
git commit. In all cases, it's actually
git commit, rather than
git merge itself, that makes the merge commit.
This last part is actually very easy, since the index / staging-area is all set up. Git just makes a commit as usual, except that instead of giving it the current (
HEAD) commit's ID as its one single parent, it gives it at least two parent commit IDs: the first one is the
HEAD commit as usual, and the rest come from a file left behind by
git merge (
git apply -3, which can be spelled out as
git apply to use the
index string in
git diff output to construct a merge base if needed. When doing this with
git cherry-pick and
git revert, to turn them into merges (instead of straightforward patches), Git winds up using the parent commit of the cherry-picked or reverted commit. It's worth noting here that Git does this only on a per file basis, after treating the patch just as a simple patch has failed. Using the parent commit's file as a base version for a three-way merge will normally help only if that commit is an ancestor of the current (HEAD) commit. If it's not actually such an ancestor, combining the diff generated from "base" to HEAD with the patch being applied is probably not helpful. Still, Git will do it as a fallback.
3As usual for Wikipedia, I spotted some minor inaccuracies in it just now—for instance, it's possible to have more than two DAG LCAs—but don't have time to work on it, and it's not a bad overview.
4Often, it never bothers to make the conflicted entries in the first place. Git will, if possible, short-cut-away even the
git diff phase. Suppose for instance that the base commit has four files in it:
u.txt is unchanged from base in either commit,
one.txt is changed from base to
HEAD but not from base to the other commit,
two.txt is changed from base to the other commit but not from base to
three.txt is changed in both. Git will simply copy
u.txt straight through, take
two.txt from the other commit, and only bother to generate diffs, then try to merge them, for
three.txt. This goes pretty fast, but does mean that if you have your own special three-way-merge program for these files, it never gets run for
two.txt, only for
I am not sure off-hand whether Git makes the multiple index entries before attempting the merging of diffs, or after attempting and failing. It does, however, have to make all three entries before running custom merge drivers.
The above sequence—check out some commit (usually a branch tip), run
git merge on another commit (usually some other branch tip), find a suitable merge base, make two sets of diffs, combine the diffs, and commit the result—is how normal merges work, and how Git makes merge commits. We merge (as a verb) the changes, and make a merge (adjective) commit (or "make a merge", noun). But, as we noted earlier,
git merge doesn't always do this.
git merge says it's "doing a fast-forward merge". This is a little bit of a misnomer, because "fast-forwarding" is more accurately considered a property of a branch label change, in Git. There are two other commands that use this property,
git fetch and
git push, which distinguish between a normal (or "fast-forward") branch update and a "forced" update. A proper discussion of fast-forwarding requires getting into the details of the commit graph, so all I will say here is that it occurs when you move a branch label from commit O (old) to commit N (new), and commit N has commit O as an ancestor.
git merge detects that your merge argument is one of these cases—that
HEAD is an ancestor of this other commit—it will normally invoke this fast-forward operation instead. In this case, Git just uses the commit you told it to merge. There's no new commit at all, just the re-use of some existing commit. Git does not make a merge commit, nor does it do any merging-as-a-verb. It just changes you over to the new commit, almost as if by
git reset --hard: moving the current branch label and updating the work-tree.
You can suppress this fast-forward action with
--no-ff.5 In this case,
git merge will make a new merge commit even if a fast-forward is possible. You get no merge-as-a-verb action (there's no work to do) but you do get a new merge commit, and Git updates your work-tree to match.
Note that we have covered two of three cases here:
git merge --no-ff
The missing third case is verb-without-noun: how do we get the action of a merge, combining changes, without the noun/adjective form of a merge commit? This is where "squash merges" come in. Running
git merge --squash <commit-specifier> tells Git to do the merge action as usual, but not to record the other branch / commit-ID, so that the final
git commit makes a normal, non-merge, single-parent commit.
That's really it—that's all it does! It just makes a normal, non-merge commit at the end. Oddly, it forces you to make that commit, instead of making it on its own. (There is no fundamental reason that it has to do this, and I don't know why the Git authors chose to make it behave this way.) But these are all mechanisms, not policies: they tell you how to make various kinds of commits, but not which ones you should make, or when, or—most important—why.
5You can tell
git merge that it should only proceed if it can fast-forward:
git merge --ff-only. If the new commit is fast-forward-able,
git merge updates to it. Otherwise it simply fails. I made an alias,
git mff, that does this, since normally I want to
git fetch and then see whether I need to merge, rebase, make a new branch entirely, or whatever. If I can fast-forward, I don't need to do anything else, so if
git mff works, I'm done.
The why question is hard, and like all philosophy questions, has no one right answer (but definitely a bunch of wrong ones :-) ). Consider this fact: Every time you use
git merge at all, you could have done something different and gotten the same source code to go with your newest commit. There are three successful outcomes for a
git merge (that is, a merge where you do not
git merge --abort to end it, but rather conclude it successfully):
The only difference between these three (aside from the obvious "no new commit at all" for the first one) is the record they leave behind in the commit graph.6 A fast-forward obviously leaves no record: the graph is unchanged from before, because you added nothing. If that's what you want, that's what you should use. In a repository where you are following someone else's work and never doing anything of your own, this is probably what you want. It is also what you will get by default, and Git will "just work" for you.7
If you do a regular merge, that leaves a record of the merge. All the existing commits remain exactly as they are, and Git adds one new commit, with two8 parents. Anyone coming along later will see just who did what, when, how, etc. If this is what you want, this is what you should do. Of course, some tools (like
git log) will show who did what, when, etc., which—by showing a complete picture of all of history—may obscure the Big Picture view with all the little details. That's both the up-side and the down-side, in other words.
If you do a squash merge, that leaves no record of the merge. You make a new commit that picks up every merge-as-a-verb action, but the new commit is not a merge-as-a-noun. Anyone coming along later will see all the work that went in, but not where it came from. Tools like
git log cannot show the little details, and you—and everyone else—will get only a Big Picture. Again, that is both the up-side and the down-side. But the down-side is perhaps a bit bigger, because if you find that you need those details later, they are not there. They are not only not there in the
git log view, they are also not there for a future
If you are never going to do a future
git merge of the squashed-in changes, that might not be a problem. If you plan to delete that branch entirely, giving up all the individual changes as individuals and keeping only the single collective squash-merge change, the "bad" part of doing the
git merge --squash has essentially zero "badness value". If you intend to keep working on that branch, though, and merge it again later, that particular badness value increases hugely.
If you are doing squash merges specifically to make
git log output "look nicer" (show more of a Big Picture instead of obscuring it with too many details), note that there are various
git log options designed to be selective about which commits it shows. In particular,
--first-commit avoids traversing merged-in branches entirely, showing only the merge itself and then continuing down the "main line" of the commit graph. You can also use
--simplify-by-decoration to omit all but tagged commits, for instance.
6Well, also in your reflogs; but your reflogs are private, and eventually expire, so we'll just ignore them.
7This assumes that they—whoever "they" are—do not "rewind" their commit graph, by rebasing or removing published commits. If they do remove published commits, your Git will by default merge those commits back in, as if they were your own work. This is one reason anyone publishing a Git repository should think hard about "rewinding" such commits.
8Assuming no fancy octopus merges.