This probably never happened in the real-world yet, and may never happen, but let's consider this: say you have a git repository, make a commit, and get very very unlucky: one of the blobs end up having the same SHA-1 as another that is already in your repository. Question is, how would git handle this? Simply fail? Find a way to link the two blobs and check which one is needed according to the context?
More a brain-teaser than an actual problem, but I found the issue interesting.
I did an experiment to find out exactly how Git would behave in this case. This is with version 2.7.9~rc0+next.20151210. I basically just reduced the hash size from 160-bit to 4-bit by applying the following diff and rebuilding git:
--- git-2.7.0~rc0+next.20151210.orig/block-sha1/sha1.c +++ git-2.7.0~rc0+next.20151210/block-sha1/sha1.c @@ -246,6 +246,8 @@ void blk_SHA1_Final(unsigned char hashou blk_SHA1_Update(ctx, padlen, 8); /* Output hash */ - for (i = 0; i < 5; i++) - put_be32(hashout + i * 4, ctx->H[i]); + for (i = 0; i < 1; i++) + put_be32(hashout + i * 4, (ctx->H[i] & 0xf000000)); + for (i = 1; i < 5; i++) + put_be32(hashout + i * 4, 0); }
Then I did a few commits and noticed the following.
For #2 you will typically get an error like this when you run "git push":
error: object 0400000000000000000000000000000000000000 is a tree, not a blob fatal: bad blob object error: failed to push some refs to origin
error: unable to read sha1 file of file.txt (0400000000000000000000000000000000000000)
if you delete the file and then run "git checkout file.txt".
For #4 and #6, you will typically get an error like this:
error: Trying to write non-commit object f000000000000000000000000000000000000000 to branch refs/heads/master fatal: cannot update HEAD ref
when running "git commit". In this case you can typically just type "git commit" again since this will create a new hash (because of the changed timestamp)
For #5 and #9, you will typically get an error like this:
fatal: 1000000000000000000000000000000000000000 is not a valid 'tree' object
when running "git commit"
If someone tries to clone your corrupt repository, they will typically see something like:
git clone (one repo with collided blob, d000000000000000000000000000000000000000 is commit, f000000000000000000000000000000000000000 is tree) Cloning into 'clonedversion'... done. error: unable to read sha1 file of s (d000000000000000000000000000000000000000) error: unable to read sha1 file of tullebukk (f000000000000000000000000000000000000000) fatal: unable to checkout working tree warning: Clone succeeded, but checkout failed. You can inspect what was checked out with 'git status' and retry the checkout with 'git checkout -f HEAD'
What "worries" me is that in two cases (2,3) the repository becomes corrupt without any warnings, and in 3 cases (1,7,8), everything seems ok, but the repository content is different than what you expect it to be. People cloning or pulling will have a different content than what you have. The cases 4,5,6 and 9 are ok, since it will stop ẅith an error. I suppose it would be better if it failed with an error at least in all cases.