Blumer Blumer - 9 days ago 4
Git Question

What happened to this Git repo and how can I restore sanity to it?

I returned from a conference and have found a delightful oddity in the master branch of a critical project. A number of files I had committed prior to the conference no longer show up at the head of the master.


If I look at the history of the directory containing the missing files through either TFS's web interface or the integration in Visual Studio, the last commit to anything in the directory is me adding several of the missing files. There are no subsequent commits removing them.

If I navigate the project through the TFS web interface, my files do not exist in the directory.

If I examine them from the command line with
git log [directory]
, the last commit that comes up is one from back on August 24th, prior to the addition of the missing files. Five commits are missing.

If I execute
git log .
at the top level of the project, these five commit messages do appear.

There are two commits to the project that were done by a currently-unavailable coworker while I was gone. They do not touch the same part of the project that the files are missing from. There is a good chance the coworker last pulled from the master after August 24th, but prior to the five subsequent commits.


From my limited understanding, I am theorizing that the coworker ran into conflicts when trying to bring his changes in and, rather than resolve them, may have done a
git reset
in order to proceed, but admittedly my knowledge of those realms of git is shaky at best.


  • Does that fit the symptoms described above?

  • Are there things in logs I can look for that would indicate such an action?

  • If that is the likely cause, how can I recover the five-or-so commits that appear to have been undone?


Short version

  1. Yes.

  2. Use --full-history, and optionally also -m and more flags if desired.

  3. You may be able to use git cherry-pick to re-copy the original commits to new, post-merge commits; or it may be simpler to do some of it manually. You can use git show -m on any given "bad" merge to get diffs against both parents, and then hand-apply dropped changes, or even massage the diffs and use git apply. Or, you can check out the commit just before the bad merge, manually re-do the merge yourself, diff the tree for the "wrong" merge against the tree for the corrected one, and hence come up with a patch that might apply to the commit chain you have now.

Long version of point 2

Use git log --full-history -- <path> to avoid dropping some commits that touch the given path. Without --full-history, Git tends to discard merges that modify the file from one side only, e.g., that have, on a per-file basis, resolved conflicts using an ours-or-theirs-only strategy. (This seems like a bug of sorts to me: Some cases of commit-limiting, as with paths here, probably should include some or even all of the merges that combined diff explicitly discards. But that's a philosophical change / "bug", rather than an outright "clearly not correct behavior" bug.)

Add -m --name-status to observe modifications in the merge commits themselves. Add --merges as well to observe only merges:

git log --full-history -m --name-status --merges -- <path>...

You may replace --name-status with -p to have git log show diffs. Or, once you have found suspicious merges (by their hash IDs), use git show -m <hash> to view them in detail.

Note that with -m, the output from git log and git show for merges changes a bit:

commit <sha1> (from <parentN>)
Merge: <parent1> <parent2>
Author: ...

What the -m flag does here is to split the merge, for diff purposes, into multiple virtual commits, one for each parent. The diff is then done against that specific parent.

Scenario (or how you got here, or long version of point 1)

All too often, someone runs git merge and mis-handles merge conflicts by simply choosing "version on the left" or "version on the right" instead of "mostly from the base, with a little from the left as appropriate, and a little from the right as needed". The result is that the conflicted file loses all the changes from one half of the merge.

Sometimes that is the right thing to do, and sometimes it's not. Git produced a conflict because at least some of the changes conflicted. When some of the changes conflict, it's quite possible that only "column A" or "column B" has the right answer. But we might have non-conflicting changes in the same file, on different lines.

For instance, consider this fragment of a (rather artificial) text file:

  #       base             column A           column B

 94:   We show that       We show that       We show that
 95:   2 + 2 = 5.         2 + 2 = 3.         2 + 2 = 4.
102:   This moth proof... This math proof... This math proof...

The "base" version has two errors, one being the claim that 2 + 2 = 5 and the other calling it a moth proof.

The version on one branch, in "column A", tries to correct it: 2+2 is not five, and it's a math (or "maths") proof, not a moth proof.

The version on the other branch, "column B", does correct it—but the author of that version missed the bug on line 102.

The correct merge strategy, in this case, is to take line 95 from the second changed version, and line 102 from the first changed version. Some merge tools make this easy to do correctly—I myself use vim on a file with merge conflict markers with merge.conflictstyle = diff3, and hence don't even have to see line 102 where there was no conflict—and some merge tools try to show you a global-ish view, making it far too easy to eyeball this and say "oh, just use the second parent" (column B).

In short, your theory is probably correct (although git reset was probably not the culprit). To find out for sure, though, you would have to observe the same person re-doing that same merge—assuming he or she has not yet learned to do the merge correctly!

Long version of point 3

This works best, I think, with a drawing of some of the commit graph. You can get this from various tools, including git log --graph (optionally with --oneline and flags like --decorate), and from visualizers like gitk. They tend to present the graph vertically, with newer commits towards the top. I like to draw them horizontally, with newer commits towards the right, for textual reasons.

Let's say that the graph currently looks something like this:

...--o--*--o--o--o--E--M--o--o--F   <-- branch
         \            /
          A--B--C--D-´   <-- topic [label may no longer exist]

Here, * is the merge base for faulty merge M. At the time someone broke things, the tip of the more main branch was commit E and the tip of the topic branch was commit D. The person who did the merge was on branch branch, and things looked like this:

...--o--*--o--o--o--E   <-- branch
          A--B--C--D    <-- topic

They ran git merge branch, or perhaps did the equivalent from some GUI that led them down the faulty-merge-garden-path. This produced merge conflicts, which they resolved inappropriately and committed.

You can re-do the entire merge. Simply check out commit E by ID, and/or give it a branch name. Using git checkout -b we can do both at once:

git checkout -b remerge <id-of-E>

(Or, you can do this in "detached HEAD" mode, which is what I would do for a quick one-off test. You can always give the detached HEAD a name with git checkout -b later.)

Now that commit E is current, simply re-run the merge:

git merge <id-of-D>

Git will perform the same merge actions it would have "back then", since it's merging the same commit into the same commit, starting from the same graph. (Note: if you have git rerere enabled, you may want to temporarily disable it here, especially if you are the one who made the faulty merge. :-) )

If you now resolve the conflicts (correctly this time) and commit the result, you'll get a new merge M2:

...--o--*--o--o--o--E--M--o--...   <-- branch
         \           \/
                    \  \
                     `--M2   <-- remerge

You can now compare M vs M2:

git diff <id-of-M> HEAD

to see what is needed to change M into M2.

There are many options here

Whether or not you make M2, you can:

  • Look at commits A--B--C--D, and if you just need some or all of them, use git cherry-pick to copy them to where you are now:

    ...--o--*--o--o--o--E--M--o--o--F--A'-B'-C'-D'   <-- branch
             \            /

    Here A', B', C', and D' are cherry-picked copies of A through D.

  • Try git apply-ing the diff between M and M2 atop F:

    ...--o--*--o--o--o--E--M--o--o--F--G   <-- branch
             \            /

    (save the git diff output in a file, or re-run the git diff and pipe to git apply).

  • You can even merge M2 in to branch now (although this will likely be quite messy: the merge base of commits F and M2 is a virtual merge of D and E, which is a bad situation since we're presupposing conflicts here in the first place!), or cherry-pick M2 against one of its first parent E.

Depending on the situation, I would probably go with cherry-picking A--B--C--D if there are not too many, or even copying them into a new topic branch and merging that; or just git apply-ing the M-vs-M2 diff.

Using rebase to copy a topic branch to redo the merge another way

The "copy to a new topic branch" method probably deserves its own little description here, though, since git rebase is the command that does this. Here is how to use git rebase to copy topic to retopic. Let's assume we start with this:

...--o--*--o--o--o--E--M--o--o--F   <-- branch
         \            /
          A--B--C--D-´   <-- topic

First, we need a branch named retopic:

git checkout -b retopic topic

(If the name topic is gone, use the ID of commit D.) Now we have:

...--o--*--o--o--o--E--M--o--o--F   <-- branch
         \            /
          A--B--C--D-´   <-- topic, HEAD -> retopic

Now simply run git rebase --onto branch <id-of-E>. If the ID of E is not handy but the ID of commit A is, use <id-of-A>^ (note the hat suffix) to produce the ID of commit *. All we're doing here is directing git rebase to copy commits ending at D (where retopic points), and starting from commit A (by excluding commits * and earlier, which are reachable from commit E).

Resolve conflicts as they occur—you may want to enable git rerere before you even begin the rebase—and when you are done you have this:

                                  A'-B'-C'-D'   <-- HEAD -> retopic
...--o--*--o--o--o--E--M--o--o--F   <-- branch
         \            /
          A--B--C--D-´   <-- topic

You can now git checkout branch and git merge --no-ff retopic to make a new merge M2 from the rebase-copied commits. (Note: some or all of A' through D' may drop out entirely during the copying, depending on what was retained in faulty merge M.)