kitfox kitfox - 4 months ago 27
Git Question

Do version control systems use diffs to store binary files?

How do popular version control systems (svn, git) handle storing revisions to a binary document? I have a projects with binary sources that are periodically updated and need to be checked in (mostly Photoshop documents, a custom data format and a few word processing documents). I've always been worried about checking in the binaries because I thought that the VCS might take a simple route of simply uploading a new copy of the binary each time - and hence my repository would get huge quickly.

If I have several data blocks (let's call them A, B, C, D, etc) and I have a binary file that on first check in looks like ABC, but then on the second check in has been modified to ADBE, will my VCS be smart enough to only store the changed bits or will it create an entirely new image of the file?

qzb qzb


Git can store just diffs of binary files, but it's not very efficient, so you probably should use some external tools like lfs.

Sligtly longer explenation

By default git doesn't store diffs between commits. When you change some file and make a new commit, git stores object with content of whole file. It doesn't matter if you change just one line, or rewrite whole file - git doesn't store diffs, at least at first place. There is piece of git called git-gc (garbage collector) responsible for tasks such removing dangling commits and optimization, it runs another git command - git-repack which does exactly what you ask for. It takes whole bunch of objects and stores stores them inside one pack using delta compression.

Unfortunately packing with git-repack is not especially efficient when comes to comperssing binary files. You can always tweak it, but if your files change a lot, or if they are really big, you should probably use some external tool like lfs.