John 41_ John 41_ - 3 years ago 159
Git Question

GIT, How is the working directory populated

I am wondering how the GIT working directory "working tree" is populated?

Are the files somehow extrapolated through the tree like relationship that exists starting at the commit that HEAD refers to and working "backwards" towards the root of the tree?

Maybe if someone could provide some type of high level process that occurs.. ie,

1.) Add all the files contained in the commit referred to by HEAD to the working tree.

2.) Recursively, for each file referenced by the parent commit of HEAD, ad those to the working tree as well.

I'm curious how this work, is there a verbose mode to something like git checkout where a hypothetical function called build_working_tree() would output its actions?

Answer Source

Ignore for the moment exactly how a repo creates file trees and points to individual files. Let's just stick to the fact that they do create references to these objects. The important point is that if the object is exactly the same (including having the exact same folders and exact same files in it for a folder), then git will just point to the same object in a future commit.

Assume commit c1 has an initial commit with just file1.txt

c1 -> file1

then commit c2 is made, which has the same file1, so it just creates a reference to the old object for that (same object as c1 did). It also adds a folder dir1 and a file2 inside of dir1, so it creates links to those.

c2 ----> dir1 -> file2
     \         
c1 -> file1

Now add a commit c3, and again, have file1 be the same, so c3 can still point to the same object, and file2 is the same but a new file3 is added to dir1. This means dir1 has to change (I show this as dir1*, but it can still point to the old file2 object. A new file3 is added to dir1* as well.

c3 -> dir1* ------> new file3
   \            \
c2 -\ -> dir1 -> file2
     \         
c1 -> file1

The point is, you don't need to know anything about c1, c2, or even dir1 in order to recreate the working directory for c3. It is pointing to file1, dir1*, file2, and file3, and can find them in the object repo without needing to know about the other objects.

Now, there is more to it, of course, because sometimes Git only stores the differences between the files, if the files are big and the diff is small (among other optimizations), but this high-level conception covers the basic idea.

As far as the lower-level plumbing commands, yes they do exist, and Git actually uses them when it does it's thing. These are outlined in the link that Chris gave in his comment: Git Internals: Git Objects. This will show you how to follow the commit hash into the objects stored in the repo and display the text in each one - both the hash pointing to each object, and the actual object itself.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download