I returned from a conference and have found a delightful oddity in the master branch of a critical project. A number of files I had committed prior to the conference no longer show up at the head of the master.
git log [directory]
git log .
--full-history, and optionally also
-m and more flags if desired.
You may be able to use
git cherry-pick to re-copy the original commits to new, post-merge commits; or it may be simpler to do some of it manually. You can use
git show -m on any given "bad" merge to get diffs against both parents, and then hand-apply dropped changes, or even massage the diffs and use
git apply. Or, you can check out the commit just before the bad merge, manually re-do the merge yourself, diff the tree for the "wrong" merge against the tree for the corrected one, and hence come up with a patch that might apply to the commit chain you have now.
git log --full-history -- <path> to avoid dropping some commits that touch the given path. Without
--full-history, Git tends to discard merges that modify the file from one side only, e.g., that have, on a per-file basis, resolved conflicts using an ours-or-theirs-only strategy. (This seems like a bug of sorts to me: Some cases of commit-limiting, as with paths here, probably should include some or even all of the merges that combined diff explicitly discards. But that's a philosophical change / "bug", rather than an outright "clearly not correct behavior" bug.)
-m --name-status to observe modifications in the merge commits themselves. Add
--merges as well to observe only merges:
git log --full-history -m --name-status --merges -- <path>...
You may replace
-p to have
git log show diffs. Or, once you have found suspicious merges (by their hash IDs), use
git show -m <hash> to view them in detail.
Note that with
-m, the output from
git log and
git show for merges changes a bit:
commit <sha1> (from <parentN>) Merge: <parent1> <parent2> Author: ...
-m flag does here is to split the merge, for diff purposes, into multiple virtual commits, one for each parent. The diff is then done against that specific parent.
All too often, someone runs
git merge and mis-handles merge conflicts by simply choosing "version on the left" or "version on the right" instead of "mostly from the base, with a little from the left as appropriate, and a little from the right as needed". The result is that the conflicted file loses all the changes from one half of the merge.
Sometimes that is the right thing to do, and sometimes it's not. Git produced a conflict because at least some of the changes conflicted. When some of the changes conflict, it's quite possible that only "column A" or "column B" has the right answer. But we might have non-conflicting changes in the same file, on different lines.
For instance, consider this fragment of a (rather artificial) text file:
# base column A column B 94: We show that We show that We show that 95: 2 + 2 = 5. 2 + 2 = 3. 2 + 2 = 4. 96: 97: 98: 99: 100: 101: 102: This moth proof... This math proof... This math proof...
The "base" version has two errors, one being the claim that
2 + 2 = 5 and the other calling it a
The version on one branch, in "column A", tries to correct it: 2+2 is not five, and it's a math (or "maths") proof, not a moth proof.
The version on the other branch, "column B", does correct it—but the author of that version missed the bug on line 102.
The correct merge strategy, in this case, is to take line 95 from the second changed version, and line 102 from the first changed version. Some merge tools make this easy to do correctly—I myself use
vim on a file with merge conflict markers with
merge.conflictstyle = diff3, and hence don't even have to see line 102 where there was no conflict—and some merge tools try to show you a global-ish view, making it far too easy to eyeball this and say "oh, just use the second parent" (column B).
In short, your theory is probably correct (although
git reset was probably not the culprit). To find out for sure, though, you would have to observe the same person re-doing that same merge—assuming he or she has not yet learned to do the merge correctly!
This works best, I think, with a drawing of some of the commit graph. You can get this from various tools, including
git log --graph (optionally with
--oneline and flags like
--decorate), and from visualizers like
gitk. They tend to present the graph vertically, with newer commits towards the top. I like to draw them horizontally, with newer commits towards the right, for textual reasons.
Let's say that the graph currently looks something like this:
...--o--*--o--o--o--E--M--o--o--F <-- branch \ / A--B--C--D-´ <-- topic [label may no longer exist]
* is the merge base for faulty merge
M. At the time someone broke things, the tip of the more main
branch was commit
E and the tip of the
topic branch was commit
D. The person who did the merge was on branch
branch, and things looked like this:
...--o--*--o--o--o--E <-- branch \ A--B--C--D <-- topic
git merge branch, or perhaps did the equivalent from some GUI that led them down the faulty-merge-garden-path. This produced merge conflicts, which they resolved inappropriately and committed.
You can re-do the entire merge. Simply check out commit
E by ID, and/or give it a branch name. Using
git checkout -b we can do both at once:
git checkout -b remerge <id-of-E>
(Or, you can do this in "detached HEAD" mode, which is what I would do for a quick one-off test. You can always give the detached HEAD a name with
git checkout -b later.)
Now that commit
E is current, simply re-run the merge:
git merge <id-of-D>
Git will perform the same merge actions it would have "back then", since it's merging the same commit into the same commit, starting from the same graph. (Note: if you have
git rerere enabled, you may want to temporarily disable it here, especially if you are the one who made the faulty merge. :-) )
If you now resolve the conflicts (correctly this time) and commit the result, you'll get a new merge
...--o--*--o--o--o--E--M--o--... <-- branch \ \/ A--B--C--D-´\ \ \ `--M2 <-- remerge
You can now compare
git diff <id-of-M> HEAD
to see what is needed to change
Whether or not you make
M2, you can:
Look at commits
A--B--C--D, and if you just need some or all of them, use
git cherry-pick to copy them to where you are now:
...--o--*--o--o--o--E--M--o--o--F--A'-B'-C'-D' <-- branch \ / A--B--C--D-´
D' are cherry-picked copies of
git apply-ing the diff between
...--o--*--o--o--o--E--M--o--o--F--G <-- branch \ / A--B--C--D-´
git diff output in a file, or re-run the
git diff and pipe to
You can even merge
M2 in to
branch now (although this will likely be quite messy: the merge base of commits
M2 is a virtual merge of
E, which is a bad situation since we're presupposing conflicts here in the first place!), or cherry-pick
M2 against one of its first parent
Depending on the situation, I would probably go with cherry-picking
A--B--C--D if there are not too many, or even copying them into a new topic branch and merging that; or just
git apply-ing the
The "copy to a new topic branch" method probably deserves its own little description here, though, since
git rebase is the command that does this. Here is how to use
git rebase to copy
retopic. Let's assume we start with this:
...--o--*--o--o--o--E--M--o--o--F <-- branch \ / A--B--C--D-´ <-- topic
First, we need a branch named
git checkout -b retopic topic
(If the name
topic is gone, use the ID of commit
D.) Now we have:
...--o--*--o--o--o--E--M--o--o--F <-- branch \ / A--B--C--D-´ <-- topic, HEAD -> retopic
Now simply run
git rebase --onto branch <id-of-E>. If the ID of
E is not handy but the ID of commit
A is, use
<id-of-A>^ (note the hat suffix) to produce the ID of commit
*. All we're doing here is directing
git rebase to copy commits ending at
retopic points), and starting from commit
A (by excluding commits
* and earlier, which are reachable from commit
Resolve conflicts as they occur—you may want to enable
git rerere before you even begin the rebase—and when you are done you have this:
A'-B'-C'-D' <-- HEAD -> retopic / ...--o--*--o--o--o--E--M--o--o--F <-- branch \ / A--B--C--D-´ <-- topic
You can now
git checkout branch and
git merge --no-ff retopic to make a new merge
M2 from the rebase-copied commits. (Note: some or all of
D' may drop out entirely during the copying, depending on what was retained in faulty merge