nzer0 nzer0 - 25 days ago 13
Git Question

Moving large number of large files in git repository

My repository has large number of large files.
They are mostly data (text).
Sometimes, I need to move these files to another location due to refactoring or packaging.

I use

git mv
command to "rename" the path of the files, but it seems inefficient in that the size of the commit (the actual diff size) is very huge, same as
rm
,
git add


Is there other ways to reduce the commit size?
or should I just add them to
.gitignore
and upload as a zip file to upstream?




Thank you for the answers.

FYI,
following series of commands will result the size of the file
bar


git mv foo bar
git commit -m "modify"
git cat-file -s HEAD:bar


from which I thought git did
rm
and
add
.
Would you tell me if this info is not related to the actual size or not?

Answer Source

By design, if you move a file inside a Git repository without changing content, creating a commit will only store new metadata (a.k.a. tree objects) to represent new file location. Since content is unchanged, Git doesn't need to create new blob object to store file content. So "commit size" should be rather small.

Since you say that diff size is huge, I suppose that some file content is modified along with relocation. This would be a reason for "commit size" to be huge.

In both case, you can try to shrink .git directory size with the command git gc --prune --aggressive

EDIT :

git mv foo bar
git commit -m "modify"
git cat-file -s HEAD:bar

These commands create a new commit, but the since the foo/bar file content has not changed, Git won't store anything new but the new file name. In fact, in you example, git cat-file -s HEAD:foo before rename and git cat-file -s HEAD:bar after will give you the same result, since its the same content (same blob in .git/objects). I think you are mis-interpreting things that git does internally. Have a look to Git objets to get further explanations.

Remember that git tracks content, not files.