Aram Aram - 2 months ago 38
Git Question

Removing merges from history and rebasing non sequential commits

I have the following git history, where I want to squash a commit and remove several merges

git log --graph --oneline --all
* 80e2fa1 I want to squash this commit
* 7850013 Merge branch 'master' want to get rid of these branch merges
|\
| * b645616
| * 4aee207
* | 6077ae7 Into this one
* | bc882a2
|/
* 7607733
* 82643ed Merge branch 'master'
|\
| * ead8b48
* | 5fadbff
|/
* 2e0797c
* 3552f8f Merge branch 'master'
|\
| *
* |
|/
* e1defac
* 912c802


But when I try to do a git rebase --interactive 6077ae7 I end up getting

The previous cherry-pick is now empty, possibly due to conflict resolution.


I also don't get the merges when I'm doing the rebase going before the merge is this the correct way to remove them? this didn't work for me git remove merge commit from history.

Answer

Rebase normally ignores merges. In your case this is actually going to be helpful!

Background

First, remember that rebase copies commits. Whichever commits actually do get rebased, get copied. (The originals remain in your repository, just "abandoned", so that you can rescue them if you do not like the result of the rebase. After some time, by default 30 days, the abandonment becomes much more permanent, as the garbage collector is now allowed to remove them for real.)

Let's start by re-drawing the graph that git log --oneline shows (which is useful and has lots of good information) in a horizontal format that discards some information. This might ordinarily be less useful, but it should make it easier to "see the forest" (by ignoring some of the trees, as it were). I am not sure which branch you are actually on, so I will just use the label somebranch (your current branch may well be master, which is my guess, but I can't prove it from the log output you've quoted).

     o      o      B--C
    / \    / \    /    \
o--o   o--o   o--A      F--G   <-- somebranch
    \ /    \ /    \    /
     o      o      D--E

All the o commits are the earliest (lowest rows) ones in your git log output, while the labeled commits A through G are are from the top 7 rows. Commit G (which is our one-letter code standing in for 80e2fa1) is the one you want to "squash away", into commit C (really 6077ae7). Commit F is the merge you say you would like to discard (really 7850013).

Now, the simplest form of git rebase is just git rebase with no arguments at all but that's probably not what we want. Instead, we need to copy—and hence flatten away—some commits starting with either B--C or with D--E, so as to see this result:

     o      o
    / \    / \
o--o   o--o   o--A--D--E--B'-C'  <-- somebranch
    \ /    \ /
     o      o

or this one:

     o      o
    / \    / \
o--o   o--o   o--A--B--C'-D'-E'  <-- somebranch
    \ /    \ /
     o      o

Each of the commits with a tick-mark (like C') implies that the commit was made by copying-and-changing the original.

We might not need to change B at all. The original B has A as its parent commit. If we don't change B, though, we'll have to copy D to D'. The obvious difference between D and D' is that D' has C' as its parent. This means we also have to copy E to E', which is a lot like E but has D' as its parent.

Alternatively, we might be able to leave D and E alone entirely, by copying B to B' (with E as its parent).

In any case we definitely have to copy C to C', which is like C but (1) maybe has a different parent and (2) has G squashed-in.

The actual result will be one of these more complicated graphs, which is like our desired result, but more cluttered up and thus harder to view:

     o      o      B--C
    / \    / \    /    \
o--o   o--o   o--A      F--G    [abandoned]
    \ /    \ /    \    /
     o      o      D--E--B'--C'  <-- somebranch

(compare this carefully to the first "possible desired result"), or:

                     C'-D'-E'  <-- somebranch
                    /
     o      o      B--C
    / \    / \    /    \
o--o   o--o   o--A      F--G   [abandoned]
    \ /    \ /    \    /
     o      o      D--E

(compare this carefully to the second one).

Which one we will actually get depends on what instructions we give to git rebase --interactive.

Getting there

The command you're supplying, git rebase --interactive 6077ae7, is not quite right, because 6077ae7 is the commit we've been drawing here as "commit C". The argument you pass to git rebase is a commit it should exclude, and by default, the new commits go after that point. We must have commit C (really 6077ae7) in the instruction sheet, so that we can move commit G after it and make it a squash. (The first operation can never be squash.) The commit we want our rebase to exclude, and also have our copies go after, is actually commit A (really 7607733).

Hence the desired command is:

git rebase -i 7607733

which tells Git to make up an instruction sheet consisting of pick commands for every commit after A, up to the current commit (G), excluding merge commits (F).

It's not totally clear to me whether B--C will show up first, or after D--E, in the instruction sheet, but it doesn't really matter: all of B, C, D, E, and G will be in there, in some order, with B definitely before C, D definitely before E, and G last. So this will be either:

pick bc882a2 yada yada yada
pick 6077ae7 Into this one
pick 4aee207 yede yede yede
pick b645616 yide yide yide
pick 80e2fa1 I want to squash this commit

Or:

pick 4aee207 yede yede yede
pick b645616 yide yide yide
pick bc882a2 yada yada yada
pick 6077ae7 Into this one
pick 80e2fa1 I want to squash this commit

No matter which order they show in, it is now your job to re-arrange them so that 80e2fa1 comes right after 6077ae7. (In the second case it already does, in the first you must either move it up, or reshuffle the other pick lines.) Then you can change the pick for 80e2fa1 to squash, write out the instruction sheet, and exit your editor, and rebase will begin its copy (and squash) operations.

Additional information

Note that rebase will now copy all of B, C, D, and E even if we're going to copy B and C after E. This seems like it might contradict our drawings above, where we only got copies D' and E' if they had to, in order to come after C'.

The trick here is actually a key Git property: If a commit is copied to a bit-for-bit identical version of itself, the new commit has the same ID as the original commit. If even a single thing in the copy is different, though the new commit has a different ID.

The data that go into a commit include the author and committer, and some time stamps. The time "now" is different from the time "a few seconds ago", so it would seem like a copy would always be changed—and in fact, it would always be changed, but the git rebase code checks to see if it can get away with preserving the original commit intact, by just re-using it as is. There's almost certainly nothing to stop it from leaving the original D--E chain intact if the new result goes A--D--E.

To some extent, this doesn't matter: for your purposes, you probably don't care if you get A-D'-E'-B'-C' vs A-D-E-B'-C'. If you go for the A-B-C'-D'-E' version you probably would not care if Git produced A-B'-C'-D'-E' anyway. But the rebase code does try to preserve the original if it can, because for some cases, the hash IDs really do matter.

(If we desperately needed to preserve the A-D-E IDs, and git rebase insisted on changing them by copying to new timestamps, we could do what we wanted by using git cherry-pick to build our new chain of commits atop commit E. But in fact we can just get git rebase use its special-case "try to preserve IDs" code. In fact, if anything, it is sometimes the opposite: sometimes—albeit rarely—we desperately need to avoid preserving IDs, and this is where git rebase -i has a --no-ff flag: to tell it to avoid fast-forwarding the branch pointer, i.e., to avoid preserving IDs.)

Comments