neel neel - 1 month ago 6
Git Question

Copy multiple files and dirs from one git repo to another while keeping their original history

My requirement: Break down one git repo, into multiple git repos, preserving the same directory structure as in the original repo, and preserve the commit history for the files that are copied to the new repo.
What I have tried already:


  1. First I tried git filter-branch --subdirectory-filter based on the suggestions in http://gbayer.com/development/moving-files-from-one-git-repository-to-another-preserving-history/
    Result: The history is maintained, but can be viewed only on running
    git log --follow
    Also, the original commit history cannot be seen on Github. It just displays my merge commit as the only commit for that file, and does not display any previous commits. I can still live with this limitation and accept it as a solution. But another concern I have with this approach is that, for each folder and each file that I want to copy, I need to clone the original repo multiple times and also repeat all those 12 or 13 steps everytime. I would like to know if there is any simpler way of doing it, since I'm moving a lot of files around. Also, since the post is 5 years old, just wondering if newer easier solutions are available? (Surprisingly Google mostly shows this blog as the first search result)

  2. Next thing I tried was a comment on the earlier Greg Bayer's post http://gbayer.com/development/moving-files-from-one-git-repository-to-another-preserving-history/#comment-2685894846
    This solution made things a bit simpler by using git subtree split but the results were same as listed in the first case.

  3. Then I tried the git log --patch-with-stat and git am option based on this answer http://stackoverflow.com/a/11426261/5497551
    Result: This usually gives errors on encountering a merge, while applying the patch.
    I tried one of the suggestions to this answer of using -m --first-parent This resolved the errors but does not expand any merges into their commits, just lists the merge as a single commit. Hence most of the commit history is lost.
    So I added another option of --3way. This went over and over through the commits, and did not lead to any acceptable solution.



In conclusion, I would prefer using the 3rd solution, if only there was an option to have all the commits in a merge to be listed in the history of the new repo. Else I have to stick to the first solution which is a bit inconvenient and tedious in my situation. Any advice, help would be greatly appreciated.

Thanks.

Answer

Here is what worked for me(combining answers from @AD7six and @Olivier) to split my orig-repo into multiple new repos. I'm listing here steps for creating only one new repo new-repo1. But same have been used to create the others as well.

First create new empty repo on Github with the name new-repo1

git clone [Github url of orig-repo]

git clone --no-hardlinks orig-repo new-repo1
cd new-repo1
git remote rm origin
git checkout -b master  //This step can be skipped. I had to do it since the default branch on my orig-repo was `develop`, but on the new-repo1 I wanted to create it as `master`

//I used a script here to delete files and directories not required in the new-repo1. 
//But if you have very few files/dirs to be deleted then you can do the below.
git rm <path of file 1 to be deleted>   
git rm <path of file 2 to be deleted>
git rm -rf <path of dir 1 to be deleted>

git commit -m "Deleted non-new-repo1 code"

git ls-files > keep-these.txt
git filter-branch --force --index-filter "git rm  --ignore-unmatch --cached -qr . ; cat $PWD/keep-these.txt | xargs git reset -q \$GIT_COMMIT --" --prune-empty --tag-name-filter cat -- --all

rm -rf .git/refs/original/
git reflog expire --expire=now --all
git gc --prune=now

git init
git remote add origin [Github url of new-repo1]
git push -u origin master

After this, I can view history of files in the new-repo1 on Github as well as through command line using git log