paul_h paul_h - 19 days ago 8
Git Question

How to reduce the depth of an existing git clone?

I have a clone. I want to reduce the history on it, without cloning from scratch with a reduced depth. Worked example:

$ git clone git@github.com:apache/spark.git
# ...
$ cd spark/
$ du -hs .git
193M .git


OK, so that's not so but, but it'll serve for this discussion. If I try
gc
it gets smaller:

$ git gc --aggressive
Counting objects: 380616, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (278136/278136), done.
Writing objects: 100% (380616/380616), done.
Total 380616 (delta 182748), reused 192702 (delta 0)
Checking connectivity: 380616, done.
$ du -hs .git
108M .git


Still, pretty big though (git pull suggests that it's still push/pullable to the remote). How about repack?

$ git repack -a -d --depth=5
Counting objects: 380616, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (95388/95388), done.
Writing objects: 100% (380616/380616), done.
Total 380616 (delta 182748), reused 380616 (delta 182748)
Pauls-MBA:spark paul$ du -hs .git
108M .git


Yup, didn't get any smaller. --depth for repack isn't the same for clone:

$ git clone --depth 1 git@github.com:apache/spark.git
Cloning into 'spark'...
remote: Counting objects: 8520, done.
remote: Compressing objects: 100% (6611/6611), done.
remote: Total 8520 (delta 1448), reused 5101 (delta 710), pack-reused 0
Receiving objects: 100% (8520/8520), 14.82 MiB | 3.63 MiB/s, done.
Resolving deltas: 100% (1448/1448), done.
Checking connectivity... done.
Checking out files: 100% (13386/13386), done.
$ cd spark
$ du -hs .git
17M .git


Git pull says it's still in step with the remote, which surprises nobody.

OK - so how to change an existing clone to a shallow clone, without nixing it and checking it out afresh?

Answer
git clone --bare --mirror --depth=5  file://"$PWD" ../temp
rm -rf .git/objects
mv ../temp/{shallow,objects} .git
rm -rf ../temp

This really isn't cloning "from scratch", as it's purely local work and it creates virtually nothing more than the shallowed-out pack files, probably in the tens of kbytes total. I'd venture you're not going to get more efficient than this, you'll wind up with custom work that uses more space in the form of scripts and test work than this does in the form of a few kb of temporary repo overhead.

Comments