kopranb kopranb - 4 months ago 19
Git Question

How to deal with merges of topic branches after a squashed merge of parent

Say that I have the following scenario:

A -- B -- C -- D -- H master
\ /
E -- F -- G topicA
\
I -- J -- K topicB


topicA
was merged into
master
using the --squash switch, which means that
master
doesn't know the history of
topicA
.

If I now merge
master
into
topicB
and then do a
diff master...topicB
the diff is messed up and contains a lot of changes or undoings which shouldn't be there.

In that case, I usually merge
master
into
topicA
and then
topicA
into
topicB
before doing what was said in the previous paragraph. However, sometimes that it's not possible (e.g. the branch was deleted) and I end with a lot of conflicts.

How should I proceed in this case? Do I have any misconception?

Is
rebase --onto master topicA topicB
the right solution?

Answer

As DavidN said in a comment, your plan looks sound enough.

(The rest of this is far too long and rambly; it was written between / during other tasks.)

Digressions and specifics

Since H was created by git merge --squash, the drawing is wrong. It should read:

A -- B -- C -- D -- H master
     \
      E -- F -- G   topicA
           \
            I -- J -- K topicB

The key difference is that commit H is not related to the E--F--G sequence, at least not in any Git-detected sense. Commit H's content is affected by whatever happened in the E--F--G sequence (and, of course, by whatever happened in the C--D sequence) but as far as Git knows now, someone came along and wrote H without even looking at E--F--G.

I'm going to have a fairly big digression here now.


If I now merge master into topicB

If you did a real merge

OK, let's draw that as a commit graph, to make sure that this means what you intend. I will use my usual form (slightly more compact, with arrows from branch-names scooted over to the right a bit):

A--B--C---D----H       <-- master (still points to H)
    \           \
     E--F--G     \     <-- topicA (still points to G)
         \        \
          I--J--K--L   <-- topicB (points to new L)

Note that I drew a real merge, not a fake, not-a-merge-at-all "squash merge". This really does matter, as we will see.

When Git goes to make this new commit L, it has to merge commits H and K. To do that, it has to find their merge base (in some cases there can be several merge bases, but here there's just one).

The merge base(s) of any two commits is / are the Lowest Common Ancestors: that is, the commits closest to the two starting commits (H and K) that are reachable from both of those starting commits.

Let's start with H and K themselves first. H is reachable from H (of course) but not from K. K is reachable from K (of course) and but not from H. Now we can check D vs H and K: D is reachable from H but not from K. Now we can check J, but it's not reachable from H. Now we consider C and I, and F, and E, but it's not until we get all the way back to commit B that we find a commit reachable from both H and K. Commit A would also work, but it's further away from both H and K, so commit B is the merge base.

The merge then starts with two diffs:

git diff B H

and

git diff B K

The first diff shows what we changed going from B to H. Of course, H has what we changed in C and D, plus whatever we changed in E, F, and G. The second diff shows what we changed going from B to K. Of course, K has what we changed in E, plus whatever we changed in I--J--K.

This has whatever we changed in E twice, but Git usually—not always, but usually—does a good job of noticing that and picking up the change only once. So commit L probably has everything from every previous commit, done just once.

and then do a diff master...topicB

Note that this is using the three-dot ... syntax, not the two-dot .. syntax. I'm not sure what you intend here, but the three-dot syntax essentially means "find the (or a) merge base". So let's go through this exercise again: master still points to commit H and topicB now points to the new merge commit L, and we find the merge base of H and L, now that we have a real merge (none of this stupid "squash merge" fake merge stuff for us, no way!).

So let's start with H and L themselves first. L is reachable from L (of course) but not from H. H is reachable from H (of course) and also from L. This means the merge base of H and L is H: the merge base of master and topicB is master.

... diff master...topicB

Since master is on the left of the triple-dot, it's replaced with the merge base, which is commit H. The right side of the triple-dot is resolved to its commit, which is commit L. The diff then shows you whatever is different between H and L.

In this case, the effect is the same as for git diff master..topicB, which means the same thing as git diff master topicB: compare commits H and L, in that order.

That should be a pretty sensible diff, in spite of the horrible fake squash-merge we did initially to make H. The real merge sort of repaired this, at least for H vs L.

If you did a fake, not-a-merge "squash merge"

Let's draw this thing yet again but this time using the fake not-a-merge git merge --squash technique. The contents of our new commit L will be the same as if we had done a real merge, but the graph will be different:

A--B--C---D----H       <-- master (still points to H)
    \
     E--F--G           <-- topicA (still points to G)
         \
          I--J--K--L   <-- topicB (points to new L)

Now we go back to:

diff master...topicB

Once again, we need to find the merge base between H and L, but now L does not point back to both K and H, but only to K. Neither H nor L is the merge base. Neither D nor J work either: we can't walk backwards to D from L, and we can't walk backwards to J from H. In fact, the merge base commit is again commit B, so this means the same thing as:

git diff B L

and this diff will be quite different.

I do not know what you were expecting from your diff, so I cannot address this part:

the diff is messed up and contains a lot of changes or undoings which shouldn't be there.


Back to merge, rebase, etc.

Now let's return to the question:

In that case, I usually merge master into topicA and then topicA into topicB before doing what was said in the previous paragraph. However, sometimes that it's not possible (e.g. the branch was deleted) and I end with a lot of conflicts.

Note that deleting a branch name has no immediate effect upon its commits. What it does do is to stop protecting those commits. That is, because each branch name makes commits reachable, those commits are safe from the Grim Collector ...er... Grim Reaper Garbage Collector. We did that reachability thing several times to find merge bases; Git does it even more often, though, to find commits to keep and commits to discard, during GC; commits to transfer, during push and fetch; and so on. If the commits are protected by some other means—by reachability through a real merge, or by reachability from another branch or tag name, or whatever—they stick around. If you can find them by hash ID, you can bring them back.

More importantly for your rebase case, if you can find them by git log, you can cut them off. We'll see this in a moment.

General considerations of squash "merge"

Because squash "merges" are not actually merges at all, they won't protect the other chain of commit, and—this is usually the key to future the merge conflicts—they do not provide future merges with updated merge bases. This means those future merges must examine huge diffs, instead of small diffs, and then Git's automated "redundant change" detection fails.

What this means in practice depends on how you use these squash not-a-merge "merges". When you use them to take a line of development and reduce it to a single commit, it's probably a good idea to stop using that other line of development entirely. You can save it (using a branch or tag name, or even some other reference outside the branch and tag name spaces so that you don't normally see it, that keeps the commit chain from being GC-ed) or just let it get reaped, but either way you probably should not continue working on it, and that includes any other branches you have that fork off from some commit(s) on it.

Using git rebase

Is rebase --onto master topicA topicB the right solution?

Using git rebase, you can copy these other chains—your topicB, in this case—to new chains and then point the label (topicB) to the tip of the copied chain. The commits you want to copy are those that were not squashed: here, that's the I--J--K chain. Using topicA as the <upstream> argument to git rebase will select the right set of commits. Note that topicA reaches commits G, F, E, B, and A, while topicB reaches K, J, I, F, E, and so on; so using topicA as <upstream> chops off everything from F on back, but then requires the explicit --onto that you provided.

If the label topicA were deleted, you could still do this rebase, it just gets trickier. What you would need to do is to specify either of commits G or F by their hash IDs, so as to chop off commits F and earlier. The hash ID of G is anywhere from hard-to-find (GC has not deleted it but it is unreachable from any live reference) to non-existent (GC has deleted it). The ID for F, however, is right there in the topicB chain: K's parent is J, J's parent is I, and I's parent is F. The problem is that there is no easy way to determine that commit F was in the set of commits that were in the chain that the earlier git merge --squash handled.

(This is related to, but not quite the same thing as, the earlier remark I bolded.)

Comments