Tara Roys Tara Roys - 2 days ago 4
Git Question

What is the file format of a git commit?

Context: I was hoping to be able to search through my git commit messages and commits without having to go through the puzzlingly complex git grep command, so I decided to see how git commit messages were stored.

I took a look in a .git folder, and it looks to me like commits are stored in

.git/objects


The .git objects folder contains a bunch of folders with names like a6 and 9b. These folders each contain a file with a name that looks like a commit sha 2f29598814b07fea915514cfc4d05129967bf7. When I open one of those files in a text editor, I get gibberish.


  1. What file format is this gibberish / How is a git commit object stored?

  2. In this git commit log, the folder 9b contains one commit sha

    aed8a9f773efb2f498f19c31f8603b6cb2a4bc


    Why, and is there a case where more than one commit sha would be stored in the file 9b?

  3. is there a way to convert this gibberish into plain text so that I can mess with commits in a text editor?


Answer

Before you head down this path much further, I might recommend that you read through the section in the Git Manual about its internals. I find that knowing the contents of this chapter is usually the difference between liking Git and hating it. Understanding why Git is doing things the way it does often makes all of the sort of weird commands it has for things make more sense.

To answer your question, the gibberish that you are seeing is the data for the object after it has been compressed using zlib. If you look under the heading "Object Storage" in the link above you can see some details about how this works. This is the short version of how files are stored in git:

  1. Create a git specific header for the content.
  2. Generate a hash of the concatenation of the header + content.
  3. Compress the concatenation of the header + content.
  4. Store the compressed data to disk in a folder with a name equal to the first two characters of the data's hash and a file name with the remaining 38 characters.

So that answers your second question, a folder will contain all of the compressed objects that begin with the same two characters, regardless of their contents.

If you want to see the contents of a blob, all you have to do is decompress it. If you just want to view the contents of the file, this can be done easily enough in most programming languages. I would warn you against trying to modify data, however. Modifying even a single byte in a file will change it's hash. All of the metadata in git (namely, directory structures and commits) are stored using references to hashes, so modifying a single file means that you must also update all objects downstream from that file that reference that file's hash. Then you have to update all the objects that reference those hashes. And on, and on, and on... Trying to achieve this becomes very, very complicated very quickly. You'll save your self a lot of time and heartache by just learning git's built in commmands.

Comments