greggles greggles - 4 years ago 85
Git Question

Remove old commit information from a git repository to save space

I have a repository for storing some large binary files (tifs, jpgs, pdfs) that is growing pretty large. There is also a fair amount of files that are created, removed, and renamed and I don't care about the individual commit history. This question is somewhat simplified because I'm dealing with a repository that has no branches and no tags.

I'm curious if there's an easy way to remove some of the history from the system to save space.

I found an old thread on the git mailing list but it doesn't really specify how to use this (i.e. what the $drop is):

git filter-branch --parent-filter "sed -e 's/-p $drop//'" \
--tag-name-filter cat -- \
--all ^$drop

Answer Source

I think, you can shrink your history following this answer:

How to delete a specific revision of a github gist?

Decide on which points in history, you want to keep.

pick <hash1> <commit message>
pick <hash2> <commit message>
pick <hash3> <commit message>   <- keep
pick <hash4> <commit message>
pick <hash5> <commit message>
pick <hash6> <commit message>   <- keep
pick <hash7> <commit message>
pick <hash8> <commit message>
pick <hash9> <commit message>
pick <hash10> <commit message>  <- keep

Then, leave the first after each "keep" as "pick" and mark the others as "squash".

pick   <hash1> <commit message>
squash <hash2> <commit message>
squash <hash3> <commit message>   <- keep
pick   <hash4> <commit message>
squash <hash5> <commit message>
squash <hash6> <commit message>   <- keep
pick   <hash7> <commit message>
squash <hash8> <commit message>
squash <hash9> <commit message>
squash <hash10> <commit message>  <- keep

Then, run the rebase by saving and quitting the editor. At each "keep" point, the message editor will pop up for a combined commit message ranging from the previous "pick" up to the "keep" commit. You can then either just keep the last message or in fact combine those to document the original history without keeping all intermediate states.

After that rebase, the intermediate file data will still be in the repository but now unreferenced. git gc will now indeed get you rid of that data.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download