Sebastian Lenartowicz Sebastian Lenartowicz - 3 months ago 15
Git Question

Git push resolving deltas "completed with local objects"

While I've been using

git
for a few years, my local
git
has recently been producing a new message (I'm presuming due to the growth of the repository).

When I do a
git push
to my remote on GitHub, I get the following output (quite natural, for the most part):

Counting objects: 99, done
Delta compression using up to 4 threads.
Compressing objects: 100% (97/97), done.
Writing objects: 100% (99/99), 10.16 KiB | 0 bytes/s, done.
Total 99 (delta 66), reused 0 (delta 0)
remote: Resolving deltas: 100% (66/66), completed with 12 local objects


The specific part I'm interested in is
completed with n local objects
, which has only started appearing recently. Since, for the most part, the repository is growing at a fairly good clip (both in LoC and commit count), I'm assuming that this message has something to do with that, but I'm not sure if that's the case.

I'm aware that this isn't an error (my
git push
es have been working properly), but I'm merely curious as to the origin and meaning of this message, and why the number is so different from the actual number of objects being counted/calculated.

Answer

Bryan Pendleton's comment has the correct answer in it: your git push made a "thin pack". All fetch and push operations over smart protocols use thin packs all the time, to minimize network traffic.

Any pack file uses delta compression. Normal Git pack files only delta-compress objects against other objects in the same pack (these other objects may also be delta-compressed, but only against yet more objects in the same pack). A "thin pack" is a pack file that deliberately violates this rule: it delta-compresses objects against other (loose or packed) objects stored elsewhere. Upon receiving a thin pack, a Git must "fix" the thin pack by "fattening it up" with the missing objects, or simply destroy it (exploding the thin pack into individual, not-delta-compressed, objects).

Suppose your Git and some other Git are negotiating to send a gigabyte of data (in however many files—let's just say 1 for simplicity), but the two Gits discover that you both already have a gigabyte of file data, and the new data can be represented as: "copy the old data, delete the letter a from the middle, and insert the instead", or something equally short and simple. Whichever Git is doing the sending makes a delta-compressed object saying "starting from object with hash h, delete 1 byte at offset x, add 3 bytes the at offset x". This delta-compressed object takes a lot of CPU time—maybe even a whole second—to figure out, but just a few dozens of bytes of space. The resulting pack file is tiny and goes across the wire in microseconds. The receiving Git fattens it up by adding the missing 1GB object, and the transfer is complete.

In this particular case, completed with 12 local objects means the thin pack relied on 12 objects your Git told their Git you already had. Because of Git's DAG, your Git may be able to tell their Git that you have these objects by sending just one hash ID: if you have commit C, you have every tree and blob that commit C has, and—as long as you don't have a "shallow" repository—you also have every ancestor to commit C, and every tree and blob that goes with those ancestor commits.

This kind of compression is thus a straightforward consequence of graph theory. It's also why, even for very large projects, the initial clone may be slow, but most git fetch updates tend to be quite fast. The main exception to this rule is when you give Git data objects that do not delta-compress well against previous data objects. This includes already-compressed binary files, such as JPG images or compressed tarballs. (Ironically, uncompressed tarballs could, in theory at least, compress much better, although Git's modified xdelta did not do a great job on them in a few cases I tested in the past.)

Comments