I've read about the Git internals here and here and know what a commit is, as well as a tree and a blob.
I know that Git stores individual files instead of file differences (deltas), and that the later ones are calculated in real time as necessary. The documentation also speaks often about the "difference between two commits" (whether they are parent and child, ancestor/descendant or neither of them).
However, it's not clear to me how Git calculates those deltas in various situations (cherry-picking, merge, rebase). And which files (i.e. files from which commit) are considered in each case?
I've read that according to that structure a single commit can be considered a whole branch (i.e. the commit history leading up to that commit) in the sense that for a given file I can reach all of its versions by traversing the branch back (though not necessarily back to its root I suppose; just back to a immediately previous file version may be enough). If my assumption is wrong, please clarify.
The rules are simple enough conceptually but get complicated in practice.
git merge uses the commit DAG to find the merge base(s). The merge base is defined as the Lowest Common Ancestor (generalized in the obvious way to arbitrary DAGs where there may be multiple LCAs, vs simple trees where there's always a unique LCA). The
git merge-base command will, given two commits, find a (default) or all (
--all) merge base commits from the DAG.
If there are multiple merge bases, the algorithm depends on the
-s (strategy) argument. The default
recursive strategy merges the merge-bases using recursion (what else? :-) ). This is currently done the slow-simple-stupid way: if there are 5 merge bases, Git merges two of them (finding the merge base of those two as needed) and makes a "virtual commit" from the result, merges that result with the next (3rd) in the list-of-5, merges that result with the 4th, and merges that with the 5th to get the final virtual merge base. (To make this all work correctly, I believe Git actually makes real commits. There's no reason not to: these unreferenced commits will be garbage-collected automatically later.)
resolve strategy simply picks one of the multiple merge bases and uses that as the base.
In any case, the two diffs that get combined, once we have a single merge base hash ID
$base and the two branch-tips, are the output from:
git diff $base $tip1 git diff $base $tip2
(more or less—there's some tweaking of the
--rename-limit value if needed, depending on extra merge command arguments, and all this assumes no special merge drivers; the actual merging happens file-by-file, but the merge base version for each file comes from
$base, with any rename detection happening first from the two commit-wide diffs).
git cherry-pick command diffs each commit against its parent, and then first tries to apply the resulting delta as a patch. If that fails it falls back on "three way merge", but the merge base is on a file-by-file basis rather than a commit-by-commit basis, because it uses the
Index: information in the formatted patch. There's one
Index: line per file-in-the-patch, giving the SHA-1 IDs of the two blobs in question.
Thus, the merge base is initially ignored entirely: the cherry-pick just uses the patch as a patch. Only if the patch does not apply (as in
git apply) does the cherry-pick fall back to a three-way merge (as in
git apply -3). The blob itself must also exist in your repository—for a cherry-pick, it always does; for a literal
git apply of an emailed patch, it may not.
At this point the two diffs to be combined are:
git diff $indexbase $file1 the diff in the patch # equivalent to git diff $indexbase $file2
$indexbase is the file extracted by the hash ID in the
Index: line and
$file1 is the file in your work-tree. (This file matches the
HEAD commit unless you're using
git cherry-pick -n.) In an arbitrary (emailed) patch you don't necessarily have
$file2 at all, just the diff; in a cherry-picked patch,
$file2 is the version of the file in the commit being cherry-picked (but it's not needed since we already have the diff!).
If you cherry-pick a merge commit, you must tell Git which parent of that merge commit is to be used to produce a changeset-as-patch. This step is completely manual.
A rebase consists, functionally, of a series of cherry-pick operations. Merge commits are omitted from rebases. (Interactive rebase's
--preserve-merges operation makes new merges, completely ignoring the original merge.) An interactive rebase literally runs
git cherry-pick (one at a time for each commit to be copied), while a non-interactive rebase attempts to use
git format-patch <args> | git am -3 if it can (format-patch elides "empty" commits so this is only possible without
The commits to be copied are chosen via an actual
git rev-list --cherry-pick on a symmetric difference in some cases, or, for algorithmic purposes, something equivalent.