Delta Delta - 2 months ago 15
Git Question

GIT migrated repo is way smaller than original

I have a repository stored on filesystem that I need to migrate to a HTTPS git repository. The issue is that the migrated repo is smaller that the original, 179M vs 545 MB to be precise.

This is how the original repo looks like:

$ tree -L 2 .git

.git/
├── branches
├── config
├── FETCH_HEAD
├── HEAD
├── hooks
├── index
├── logs
│   └── refs
├── objects
│   ├── incoming_1638816568970138516.pack
│   ├── incoming_2231423675192085195.pack
│   ├── incoming_252567842603709439.pack
│   ├── incoming_2956015230264054740.pack
│   ├── incoming_3048626775278812485.pack
│   ├── incoming_3322152774343971530.pack
│   ├── incoming_3707332777993276763.pack
│   ├── incoming_407171399829023385.pack
│   ├── incoming_4072000993266381297.pack
│   ├── incoming_4293432441900999175.pack
│   ├── incoming_4833572675284287989.pack
│   ├── incoming_4943537936436869872.pack
│   ├── incoming_5555086829860720971.pack
│   ├── incoming_5912835395946639495.pack
│   ├── incoming_7273182803237175093.pack
│   ├── incoming_7510898138918506599.pack
│   ├── incoming_7865231230366160752.pack
│   ├── incoming_8724975206375007218.pack
│   ├── incoming_8787762604831244623.pack
│   ├── incoming_9046531469688239004.pack
│   ├── info
│   └── pack
└── refs
├── heads
├── remotes
└── tags


$ git branch -a

cli
max
codefactoring
* master
new-load-configuration
new-loader
plugins_dev
remotes/origin/cli
remotes/origin/max
remotes/origin/codefactoring
remotes/origin/master

$ du -sh .
545M .


This is the migration procedure I've followed:

$ mkdir temp_dir && cd temp_dir
$ git clone --mirror /path/to/original/repo
$ cd /path/to/original/repo
$ git remote add new-origin https://myuser@my.source.server/myuser/repo.git
$ git push new-origin --mirror


And then, if I look at the resulting repo size, it's 179MB.

Any idea of what is happening here?

Thank you.

Answer

The information stored in the cloned repository is packed before the clone actually starts. That way, it’s perfectly compressed and maintains a small size while containing all information of the original repository.

The original repository however likely evolved over time, so it is possibly fragmented and cannot be packed as efficiently. Maybe it is not completely packed at all but contains still unoptimized objects or even no longer reachable objects.

You could try using git gc (or one of its more aggressive options) on the original repository to see if you can shrink it further.

The bottom line however is that if the clone process completed without errors, then the cloned repository will contain all the information of the original repository. That is, every commit and its data that is reachable using branches or tags will be included. So you should not need to worry about it.

Comments