thefourtheye thefourtheye - 2 months ago 8
Git Question

How git branches and tags are stored in disks?

I recently checked one of my git repositories at work, which had more than 10,000 branches and more than 30000 tags. The total size of the repo, after a fresh clone is 12Gigs. I am sure there is no reason to have 10000 branches. So I believe they would occupy considerable amount of space in the disks. So, my questions are as follows


  1. How branches and tags are stored in disks, like what data-structure used, what information is stored for every branch?

  2. How do I get the metadata about the branches? like when that branch was created, what the size of the branch is.


Answer

All git references (branches, tags, notes, stashes, etc) use the same system. These are:

  • the references themselves, and
  • "reflogs"

Reflogs are stored in .git/logs/refs/ based on the reference-name, with one exception: reflogs for HEAD are stored in .git/logs/HEAD rather than .git/logs/refs/HEAD.

References come either "loose" or "packed". Packed refs are in .git/packed-refs, which is a flat file of (SHA-1, refname) pairs for simple refs, plus extra information for annotated tags. "Loose" refs are in .git/refs/name. These files contain either a raw SHA-1 (probably the most common), or the literal string ref: followed by the name of another reference for symbolic refs (usually only for HEAD but you can make others). Symbolic refs are not packed (or at least, I can't seem to make that happen :-) ).

Packing tags and "idle" branch heads (those that are not being updated actively) saves space and time. You can use git pack-refs to do this. However, git gc invokes git pack-refs for you, so generally you don't need to do this yourself.