ashays ashays - 1 year ago 69
Git Question

Understanding "git pull --rebase" vs "git rebase"

According to my understanding of

git pull --rebase origin master
, it should be the equivalent of running the following commands:

(from branch master): $ git fetch origin
(from branch master): $ git rebase origin/master

I seem to have found some case where this doesn't work as expected. In my workspace, I have the following setup:

  • branch
    references branch
    on remote

  • branch
    is set up to track
    , and is behind master by several commits.

  • branch
    is set up to track local branch
    , and ahead of
    by several commits.

Sometimes, I will lose commits by running the following sequence of steps

(from branch master): $ git pull --rebase
(from branch master): $ git checkout feature
(from branch feature): $ git pull --rebase

At this point, the few commits ahead I was on
have now been lost. Now, if I reset my position, and instead do the following:

(from branch feature): $ git reset --hard HEAD@{2} # rewind to before second git pull
(from branch feature): $ git rebase master

The commits have been applied correctly and my new commits on
are still present. This seems to directly contradict my understanding of how
git pull
works, unless
git fetch .
does something stranger than I expected.

Unfortunately, this is not 100% reproducible for all commits. When it does work for a commit, though, it works every time.

Note: My
git pull --rebase
here should actually be read as a
, if that matters. I have the following in my

rebase = preserve

Answer Source

(Edit, 30 Nov 2016: see also this answer to Why is git rebase discarding my commits?. It is now virtually certain that it is due to the fork-point option.)

There are a few differences between manual and pull-based git rebase (fewer now in 2.7 than there were in versions of git predating the --fork-point option in git merge-base). And, I suspect your automatic preserve-merges may be involved. It's a bit hard to be sure but the fact that your local branch follows your other local branch which is getting rebased is quite suggestive. Meanwhile, the old git pull script was also rewritten in C recently so it's harder to see what it does (though you can set environment variable GIT_TRACE to 1 to make git show you commands as it runs them internally).

In any case, there are two or three key items here (depending on how you count and split these up, I'll make it into 3):

  • git pull runs git fetch, then either git merge or git rebase per instructions, but when it runs git rebase it uses the new fork-point machinery to "recover from an upstream rebase".

  • When git rebase is run with no arguments it has a special case that invokes the fork-point machinery. When run with arguments, the fork-point machinery is disabled unless explicitly requested with --fork-point.

  • When git rebase is instructed to preserve merges, it uses the interactive rebase code (non-interactively). I'm not sure this actually matters here (hence "may be involved" above). Normally it flattens away merges and only the interactive rebase script has code to preserve them (this code actually re-does the merges since there's no other way to deal with them).

The most important item here (for sure) is the fork point code. This code uses the reflog to handle cases best shown by drawing part of the commit graph.

In a normal (no fork point stuff needed) rebase case you have something like this:

... - A - B - C - D - E   <-- origin/foo
              I - J - K   <-- foo

where A and B are commits you had when you started your branch (so that B is the merge-base), C through E are new commits you picked up from the remote via git fetch, and I through K are your own commits. The rebase code copies I through K, attaching the first copy to E, the second to the-copy-of-I, and the third to the-copy-of-J.

Git figures out—or used to, anyway—which commits to copy using git rev-list origin/, i.e., using the name of your current branch (foo) to find K and work backwards, and the name of its upstream (origin/foo) to find E and work backwards. The backwards march stops at the merge base, in this case B, and the copied result looks like this:

... - A - B - C - D - E   <-- origin/foo
           \            \
            \             I' - J' - K'   <-- foo
              I - J - K   [foo@{1}: reflog for foo]

The problem with this method occurs when the upstream—origin/foo here—is itself rebased. Let's say, for instance, that on origin someone force-pushed so that B was replaced by a new copy B' with different commit wording (and maybe a different tree as well, but, we hope, nothing that affects our I-through-K). The starting point now looks like this:

          B' - C - D - E    <-- origin/foo
... - A - B   <-- [origin/foo@{n}]
              I - J - K   <-- foo

Using git rev-list origin/, we'd select commits B, I, J, and K to be copied, and try to paste them on after E as usual; but we don't want to copy B as it really came from origin and has been replaced with its own copy B'.

What the fork point code does is look at the reflog for origin to see if B was reachable at some time. That is, it checks not just origin/master (finding E and scanning back to B' and then A), but also origin/master@{1} (pointing directly to B, probably, depending on how frequently you run git fetch), origin/master@{2}, and so on. Any commits on foo that are reachable from any origin/master@{n} are included for consideration in finding a Lowest Common Ancestor node in the graph (i.e., they're all treated as options to become the merge base that git merge-base prints out).

(It's worth noting a defect of sorts here: this automated fork point detection can only find commits that were reachable for the time that the reflog entry is maintained, which in this case defaults to 30 days. However, that's not particularly relevant to your issue.)

In your case, you have three branch names (and hence three reflogs) involved:

  • origin/master, which is updated by git fetch (the first step of your git pull while branch master)
  • master, which is updated by both you (via normal commits) and git rebase (the second step of your git pull), and
  • feature, which is updated by both you (via normal commits) and git rebase (the second step of your second git pull: you "fetch" from yourself, a no-op, then rebase feature on master).

Both rebases are run with --preserve-merges (hence non-interacting interactive mode) and --onto new-tip fork-point, where the fork-point commit ID is found by running git merge-base --fork-point upstream-name HEAD. The upstream-name for the first rebase is origin/master (well, refs/remotes/origin/master) and the upstream-name for the second rebase is master (refs/heads/master).

This should all Just Work. If your commit graph at the start of the whole process is something like what you've described:

... - A - B   <-- master, origin/master
              I - J - K   <-- feature

then the first fetch brings in some commits and makes origin/master point to the new tip:

              C - D - E   <-- origin/master
... - A - B   <-- master, origin/master@{1}
              I - J - K   <-- feature

and the first rebase then finds nothing to copy (the merge-base of master and BB=fork-point(master, origin/master)—is just B so there is nothing to copy), giving:

              C - D - E   <-- master, origin/master
... - A - B   <-- master@{1}, origin/master@{1}
              I - J - K   <-- feature

The second fetch is from yourself and a no-op/skipped entirely, leaving this as the input to the second rebase. The --onto target is master which is commit E and the fork-point of HEAD (feature) and master is also commit B, leaving commits I through K to copy after E as usual.

If some commit(s) are being dropped, something is going wrong in this process, but I can't see what.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download