aidonsnous aidonsnous - 2 months ago 8
Git Question

What is the effect scope of "git rm" command, does it have effect on all branches or HEAD branch

Does

git rm <object>
has effect on all branches or the HEAD only?
From what I do understand the answer is YES since for that command to be effective it must be followed by a
commit
command, but here comes my problem.

What about
git rm --cache <object>
, since its doesn't remove the file from working directory tree but from being tracked and add the object to .gitignore

My questions :


  1. Does it remove the object in all branches?

  2. Does it add the object to .gitignore to be ignored in next commit of repository branches(HEAD and none HEAD) or the current branch(HEAD) only?


Answer

dirn's answer is correct, so this is a bit redundant, and there are plenty of other questions about both git rm and what it means for a file to be tracked or ignored, but based on comments, I think this might be a better way to explain these several items:

What does it mean for a file to be tracked or untracked?

The short (and not 100% accurate but close enough) answer to this is that a file is tracked if it is in the index.

What does git rm do?

The git rm command does two things by default:

  1. It removes a file from the index—except this removal gets delayed. For technical reasons, it writes a "to be removed" entry into the index, instead of just removing the index entry.

  2. It removes a file from the work-tree.

When you add --cached (as in git rm --cached file1 dir/file2), Git only does the first step, skipping the second step. That is, it removes (with a delayed removal) the file from the index: for each file given as an argument, it writes an entry into the index that says "when the next commit gets made, leave this file out of that commit." That is, the file is still tracked for the moment, but as a special "so as to become deleted" rather than "generally tracked".

Note that git rm does not touch .gitignore at all, in any way—unless, of course, you run git rm .gitignore (in which case it adds a "to be removed" index entry and removes the work-tree file).

What is the index anyway?

The index has a big and very special role in merges, but ignoring that, the index mainly serves as a way for Git to stage "the next commit" one piece at a time, so that git commit itself is very fast. Many other version control systems do not have an index at all, and instead construct something much like it at commit-time by scanning every file. This takes a significant amount of time. What Git does instead is to "pre-prepare" each commit: the index holds the next commit at all times.

This means that after you make a commit, the index matches the commit exactly.1 It also means that if you now run git checkout dev or git checkout feature or git checkout master, Git can switch to that commit on that branch by comparing that commit's files to those in the current index. It only needs to change out files that are different—or, of course, remove files that are in the current commit but not in the to-be-checked-out commit, or add new files that are not in the current commit but are in the to-be-checked-out commit. So the index not only speeds up git commit, it also speeds up git checkout.


1This gets complicated by things like core.eol settings and what Git calls smudge filters, so let's ignore those. :-)


(And, of course, the index has that special role in merges. In fact, for each file, the index has up to four slots, rather than just one slot. These are called "stage numbers", and Git only ever uses at most three stage numbers per file. Merging uses stages 1, 2, and 3, while normal operation uses only stage zero. The index actually stores only the file hash IDs. The file data—the actual contents of each file—are kept in Git's objects, inside the repository.)

In summary, though, the index is what will go into the next commit you make. You git add and git rm files to update the index, and then you git commit to turn the index contents—the set of all tracked files—into a new commit. Whatever is in the index, becomes what is in that commit. Making the new commit causes the current branch to extend by one commit, so that the branch name points to the new commit you just made.

No existing commit is ever changed by any new commit. In fact, Git can't change any object (commit, tree, file, or annotated tag), by design. Commands that seem to change something, like git commit --amend or git rebase, actually fake it: they make new commits, leaving the old ones in place, undisturbed, but then pull a stage-magician's trick, using smoke and mirrors to make it appear as though the new ones have replaced the old ones.

What does .gitignore do? (More about untracked files)

I like to say that .gitignore is the wrong name for this file, because it's not really a list of files, or even glob patterns, to ignore. Ignoring is more of a side effect than anything else. The real question is what goes into commits, and that's determined, as we just noted, by what is in the index. In other words, the real question is which files are tracked, and which are not tracked.

When you run git status—which you should do often—you get output like this (real output, but trimmed a bit for posting purposes):

On branch master
Changes to be committed:
    modified:   pack.c

Changes not staged for commit:
    modified:   pytest/client.py

Untracked files:
    pytest/README

What git status does (among other things) is to run two diffs, one from the current commit to the index—this is where it finds "changes to be committed"—and one from the index to the work-tree. The latter finds "changes not staged for commit", such as pytest/client.py here, and "untracked files", such as pytest/README here.

We already noted that an untracked file is one that is not in the index. So pytest/README is not in the index (and in fact it's not).

Now, there are also a whole bunch of *.o files (from the C code) and *.pyc files (from the Python code). These are also not in the index, but git status is not complaining about them. That's because they are mentioned, by glob pattern, in .gitignore files.

Right before git status goes to complain about untracked files, git status looks at the information from the .gitignore files. If a file is untracked, but is also marked as ignored, git status suppresses its complaint. So in that sense, files in .gitignore are "don't complain about".

At the same time, though, I can do git add . or git add * to add multiple files to the index. This will update the index entry if the file is already there, or add a new entry if not. Just before git add actually adds a new file to the index, though, it looks at the information taken from the .gitignore files. If the file is untracked (not already in the index) and is marked as ignored, git add won't add it. But if the file is already tracked, git add never goes down this particular code path, and Git updates the file.

In other words, for an already tracked file, an entry in .gitignore has no effect. In this sense, then, files in .gitignore are "don't auto-add these files, but do update them if they're already added". Note that you can use git add -f (or --force) to add a file that is listed as ignored, i.e., to force past this "don't add" instruction.

Files listed in .gitignore have yet a third property, though. Normally, when Git is doing some operation that might clobber a file—such as checking out a different commit, when you have modified the (tracked) file right now but not committed the change, and the new commit to check out has a different version of that file—normally, in this case, Git will stop and complain that your request would overwrite a file with changes that are not stashed or committed. But if that file is listed as ignored, Git considers the file "non-precious" or "trashable". In this case, Git will go ahead and overwrite the file.

Those, then, are the three meanings covered by .gitignore: don't complain about untracked files, don't add them automatically, and feel free to trash them. So a single file name that covers all cases would be .git-dont-complain-about-and-dont-add-but-do-trash-these-files, or something like that. You can see why it's called .gitignore instead. :-)

An unavoidable flaw with git rm --cached

One of the main reasons to use git rm --cached is the common case of committing a file by mistake.

For instance, suppose that src.tar is a tarball file containing all the other files, and it's in an early commit and has been left in place since then. It is full of now outdated code and should be removed. This is no problem: you just git rm src.tag and commit and move on. Nothing was using it, it was just clutter. It's in the repository forever but nobody really cares.

On the other hand, suppose database.sql accidentally got committed, and it's a big and active database and has to stay in the work-tree, but was never supposed to have been committed. In this case, you git rm --cached database.sql, and add database.sql or *.sql to .gitignore to make sure it doesn't get git add-ed by mistake later, and then git commit. Well, that's fine for you: you've made a new commit in which the file no longer exists, and it's now out of the index and git status no longer complains about it, and so on.

But if you ever git checkout an older commit, now you're in trouble. In the older commit, database.sql exists. So Git will go to clobber the current version of the file, replacing it with the old one. If the file is not in .gitignore, you'll at least get a warning—but it is in .gitignore so Git will feel free to clobber the database.

There is no perfect cure for this. You can leave it as not-ignored, so that Git does not feel free to clobber it. That will prevent you from checking out the older commit, though. (This might be OK since you probably shouldn't be doing that on a live server.) It will also let someone accidentally re-add the database. (This might be OK as people probably should not be doing Git work on the live server.) And, it will keep showing up in git status. (People probably should not be doing Git work on the live server. There's a theme here... :-) )

There are other, similar cases of inappropriately committed files, though, and as with all of them, there is no perfect cure for having committed them. Just remember that when you git checkout a commit that does have the file, Git will attempt to check that out into the work-tree, and when you then go from that commit to one that does not have the file, Git will attempt to remove it.