Jun-Dai Bates-Kobashigawa Jun-Dai Bates-Kobashigawa - 1 month ago 6
Git Question

How much of a git sha is *generally* considered necessary to uniquely identify a change in a given codebase?

If you're going to build, say, a directory structure where a directory is named for a commit in a Git repository, and you want it to be short enough to make your eyes not bleed, but long enough that the chance of it colliding would be negligible, how much of the SHA substring is generally required?

Let's say I want to uniquely identify this change: https://github.com/wycats/handlebars.js/commit/e62999f9ece7d9218b9768a908f8df9c11d7e920

I can use as little as the first four characters:
https://github.com/wycats/handlebars.js/commit/e629

But I feel like that would be risky. But ssuming a codebase that, over a couple of years, might have—say—30k changes, what are the chances of collision if I use 8 characters? 12? Is there a number that's generally considered acceptable for this sort of thing?

Answer

This question is actually answered in Chapter 7 of the Pro Git book:

Generally, eight to ten characters are more than enough to be unique within a project. One of the largest Git projects, the Linux kernel, is beginning to need 12 characters out of the possible 40 to stay unique.

7 digits is the Git default for a short SHA, so that's fine for most projects. The Kernel team have increased theirs several times, as mentioned, because the have several hundred thousand commits. So for your ~30k commits, 8 or 10 digits should be perfectly fine.