Stefan Schouten Stefan Schouten - 1 month ago 10
Git Question

How do I know if a git commit has been changed?

Someone has committed something a few months ago. After that, multiple other commits have been done. Is it possible to see if someone has changed the contents of that certain commit by amending or by rebasing? If yes, how?

Answer

A commit, in Git, is never changed. Neither rebase nor git commit --amend ever change any commit, as this is not possible.1

The trick here lies in defining "a commit". How do you know which commit is which? If I say "a commit in the Git repository for Git", well, there are over 40,000 commits in there. Which one do I mean?

The unambiguous and definite way for me to tell you is for me to give you the hash ID, e.g., 9b7cbb315923e61bb0c4297c701089f30e116750. That is the true name for one specific commit:

$ git cat-file -p 9b7cbb315923e61bb0c4297c701089f30e116750 | sed 's/@/ /'
tree 4ba58c32960dcecc1fedede9c9362f5c10158f08
parent 77933f4449b8d6aa7529d627f3c7b55336f491db
author Junio C Hamano <gitster pobox.com> 1418845774 -0800
committer Junio C Hamano <gitster pobox.com> 1418845774 -0800

Git 2.2.1

Signed-off-by: Junio C Hamano <gitster pobox.com>

This name is permanently attached to this particular commit. It sure is an unwieldy and ugly name, though. Wouldn't it be nice to have a shorter, prettier, wieldy name? And there is one: I can point you to v2.2.1:

$ git rev-parse v2.2.1^{commit}
9b7cbb315923e61bb0c4297c701089f30e116750

But in fact, v2.2.1 is not a commit at all, it's a tag. Specifically, it is a tag name (found in refs/tags/v2.2.1 or in the packed-refs file under the name v2.2.1) pointing to an annotated tag object,2 rather than directly to a commit:

$ git rev-parse v2.2.1
7c56b20857837de401f79db236651a1bd886fbbb

The tag object has the commit ID inside it, plus a whole bunch of additional goop, including a "PGP signature":

$ git cat-file -p v2.2.1 | sed 's/@/ /'
object 9b7cbb315923e61bb0c4297c701089f30e116750
type commit
tag v2.2.1
tagger Junio C Hamano <gitster pobox.com> 1418851265 -0800

Git 2.2.1
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iQIcBAABAgAGBQJUkfPBAAoJELC16IaWr+bLjfgP/iA78fk3NkTEROoyIVq6kPDH
pZAlm4ObsKXAdl6sFqWe7xFxGExHYzJ5L3qGXs3VM+9Z3iDe2WZN3WbK3aFtYqfU
AYRSTpnPzDf4L0vfyqiFS7//+LoeM2TogAV7SLdehMlodsL5HR6FiSz1zffSq8D0
Ci4XpGWHkqXLhfvUPC7foCgGpf7l38gsbJPbdkyKLK9/wtLSfkk45vK+wY6o3CCv
JKBFr468958fvw+j73nxiT+Vne7TeL1Bq1kCq9M65dAjOpFjZiD408NaF7jTcNcx
TMjdKoVlDNFHcUPMv9B5C308sRVUylmeUzb8XrQNji0+1NA5ivVgDfZsudWUtlTj
jo9xku0Np4IdXPwxJNlO5tC2rnof4gdD4jWPJj/DvellNKCDXuLuXDZSKZDI9GSr
OzLsad8uFX3MySPe+evIVF6qGS2KzI8PGNrohqWaPkX8cug22EW7lKJFpjYJb5gP
3nJUJvbsrMeyoH/GqxPzA5clqMGtsirnTiapMILNRmlC+3rzc0DkLw90BM6vKNOC
eDTOI9Xj1JS9qbD6fEkxVNrXRDz0TFbtpFbFTtKk4zfAc/jTOqE9fqpV7afoQfON
e1NwrjR5Kcts7ev23Y0G1WH3t2L0N2/q27kcjrulCEH1vtXlmaZFU6o+WKUVV7iH
/YQnjNUOgRxQ1zBGof7h
=yJ4Q
-----END PGP SIGNATURE-----

The PGP signature is what lets us decide whether we believe Junio C Hamano really made and signed this tag. It uses a stronger form of encryption digital signature than SHA-1 (which is good since SHA-1 is, at least in theory, breakable) that also supports both distributed verification, and the ability to revoke signatures (which SHA-1 itself does not).

In the end, though, that only helps us if someone we trust and/or can verify has made such a PGP-signed tag, or has PGP-signed a commit. In theory, signing each commit might be a bit stronger since then there's a digital signature directly on the commit; but in practice, signing tags is much more convenient, and just as good since we don't regularly go about breaking SHA-1 (and, at least with current brute-force methods, it would leave obvious marks if we did, though that's way beyond the scope of this answer, and also somewhat beyond me to describe properly—cryptography is not my field).


1Well, it's theoretically possible if you can break the SHA-1 hash. The way Git behaves if you come up with a new, different object that nonetheless produces the same hash means you won't ever pick up this new object if you already have the old one, though. This rule applies to all Git objects (commits, trees, annotated tags, and blobs), all of which are named by their hashes.

What git rebase and git commit --amend do, to make it seem like they changed commits, is to make new copies of existing commits, and then shuffle the names around. The new commits have new, different hashes, and since a later (descendant) commit literally contains the hash of its immediate ancestor (parent) commit, "changing" one commit's hash (i.e., copying the commit object to a new, different commit object) forces the change to bubble down through the rest of the commits. We then re-point the existing (short, branch or tag) name to the tip of the new chain.

This is why, given an end-point that we believe is trust-able, we can extend that trust to each previous object in the chain or tree. The technical term for this is a Merkle tree.

2This makes it what Git calls an "annotated tag": a tag name (which by itself would be a "lightweight tag") pointing to an annotated-tag object, stored in the Git repository, with the tag object pointing to some other Git object—usually a commit, but perhaps another tag, or even a tree or a blob. However, even "another tag" is somewhat rare—there are just three of these in the Git repository for Git—and the other two are practically unheard-of.