meshy meshy - 20 days ago 8
Git Question

Restore history of divergent, then convergent repo

I am working on restoring the history of a codebase. I have recovered commits lost at the root of my git repo, and have now discovered a new complication.

A large chunk of code was split into a separate codebase for a while... and then merged back in.

Main repo: A -- B -- C -- D -- E
| ^
Code moved: | |
V |
Other repo: X -- Y -- Z


When the split (and merge) occurred, the files were simply copied into the target repo, and the history was lost.

To complicate matters further, the files were slightly modified on each copy before commit, so it's likely that I will need an extra commit for those changes.

This leads me to two questions:

Will it be possible to replace commit
D
(which copies the files back in) with the lower branch (
X-Y-Z
)? (This is my priority.)

If that's possible, will it be possible to restore the history of the files created at commit
X
as well?

There are around 300 commits on the "other" repo, and around 5000 on the "main" repo from
D
onwards.

I suspect
git-rebase
is probably required, but ideally, I would like to make use of
git-filter-branch
so that I do not have to manually resolve historical merge conflicts.

Answer

You may want to set up some git replace-ments to surgically alter / splice the history, without changing the contents of any existing commits. Then do the filter-branch as we already discussed elsewhere, to cause these replacement grafts to become permanent.

Because git replace lets you substitute in one replacement object wherever Git would normally "see" the original, and because commits have parent commit IDs, you can replace a single commit with a chain of several commits. For instance, if commit X is "bad" in some well-defined way:

...--o--P--X--Q--o--...

then we construct a new sequence of "good" commits G1 ... Gn:

...--o--P--X--Q--o--...
         \
          G1--G2--...--Gn

(where G1's parent commit ID is P, which is our bad X commit's parent; if X needs, or deserves, multiple parents, we can set all of those in good commit G1). Then we instruct Git to "replace" X with Gn, so that traversal looks like this:

...--o--P--X- [replaced]  -Q--o--...
         \               /
          G1--G2--...--Gn

Once filtered, X vanishes entirely, with commits Q and later being copied to their new copies in the usual filter-branch fashion.

To construct the "good" commits, you can literally git checkout -b tempbranch <P> and then start making the commits, though if you need to set multiple parents this is a bit more tricky (you can use git commit-tree instead of plain git commit, or cheat by creating .git/MERGE_HEAD with the remaining extra hashes in it). You may want to backdate the new "good" commits, and/or set arbitrary authors (git commit has command line switches for these, git commit-tree makes you use the magic env variables).