So I come from a centralized VCS background and am trying to nail down our workflow in Git (new company, young code base). One question I can't find a simple yet detailed answer to is what exactly does rebase on a remote branch do. I understand it rewrites the history, and in general should be limited to local branches only.
The workflow I'm currently trying to vet out involves a remote collaboration branch, each dev "owning" one for the purpose of sharing code. (Having 2 developers and max 3 in the foreseeable future a feature branch for each project & feature request seems excessive and more overhead than benefit gained.)
Then I came across this answer and tried it and it accomplished what I'd like - a dev commits and pushes often to his own collab branch, when he knows what is approved to be released to staging he can rebase remotely (to squash and perhaps reorganize) before merging into develop.
Enter the original question - if the remote branch is for the purpose of collaboration someone else is bound to pull it sooner or later. If it is a process/training issue to not have the 'guest developer' commit to that collab branch, what actually happens with the branch owner rebases that remote branch?
It's not really evil, it's a matter of implementations and expectations.
We start with a tangle of facts:
Every Git hash represents some unique object. For our purposes here we need only consider commit objects. Each hash is the result of applying a cryptographic hash function (for Git, specifically, it's SHA-1) to the contents of the object. For a commit, the contents include the ID of the source tree; the name and email address and time/date-stamp of the author and committer; the commit message; and most crucially here, the ID of the parent commit.
Changing even just a single bit in the content results in a new, very-different hash ID. The cryptographic properties of the hash function, which serve to authenticate and verify each commit (or other object), also mean that there is no way to have some different object have the same hash ID. Git counts on this for transferring objects between repositories, too.
Rebase works (necessarily) by copying commits to new commits. Even if nothing else changes—and usually, the source code associated with the new copies differs from the original source code—the whole point of the rebase is to re-parent some commit chain. For instance, we might start with:
...--o--*--o--o--o <-- develop \ o--o <-- feature
feature separates from branch
develop at commit
*, but now we would like
feature to descend from the tip commit of
develop, so we rebase it. The result is:
...--o--*--o--o--o <-- develop \ \ \ @--@ <-- feature \ o--o abandoned [used to be feature, now left-overs]
where the two
@s are copies of the original two commits.
Branch names, like
develop, are just pointers pointing to a (single) commit. The things we tend think of as "a branch", like the two commits
@--@, are formed by working backwards from each commit to its parent(s).
Branches are always expected to grow new commits. It's perfectly normal to find that
master has some new commits added on, so that the name now points to a commit—or the last of many commits—that points back to where the name used to point.
Whenever you get your Git to synchronize (to whatever degree) your repository with some other Git and its other repository, your Git and their Git have an exchange of IDs—specifically, hash IDs. Exactly which IDs depends on the direction of the transfer, and any branch names you ask your Git to use.
A remote-tracking branch is actually an entity that your Git stores, associated with your repository. Your remote-tracking branch
origin/master is, in effect, your Git's place to remember "what the Git at
origin said his
master was, the last time we talked."
So, now we take these seven items, and look at how
git fetch works. You might run
git fetch origin, for instance. At this point, your Git calls up the Git on
origin and asks it about its branches. They say things like
master = 1234567 and
branch = 89abcde (though the hash values are all exactly 40 characters long, rather than these 7-character ones).
Your Git may already have these commit objects. If so, we are nearly done! If not, it asks their Git to send those commit objects, and also any additional objects your Git needs to make sense of them. The additional objects are any files that go with those commits, and any parent commit(s) those commits use that you do not already have, plus the parents' parents, and so on, until we get to some commit object(s) that you do have. This gets you all the commits and files you need for any and all new history.1
Once your Git has all the objects safely stored away, your Git then updates your remote-tracking branches with the new IDs. Their Git just told you that their
1234567, so now your
origin/master is set to
1234567. The same goes for their
branch: it becomes your
origin/branch and your Git saves the
If you now
git checkout branch, your Git uses
origin/branch to make a new local label, pointing to
89abcde. Let's draw this:
...--o--*--o--1 <-- master, origin/master \ o--8 <-- branch, origin/branch
1234567 to just
1 here, and
89abcde to just
8, to get them to fit better.)
To make things really interesting, let's make our own new commit on
branch, too. Let's say it gets numbered
...--o--*--o--1 <-- master, origin/master \ o--8 <-- origin/branch \ A <-- branch
aaaaaaa... to just
The interesting question, then, is what happens if they—the Git from which you fetch—rebase something. Suppose, for instance, that they rebase
master. This copies some number of commits. Now you run
git fetch and your Git sees that they say
branch = fedcba9. Your Git checks to see if you have this object; if not, you get it (and its files) and its parent (and that commit's files) and so on until we reach some common point—which will, in fact, be commit
Now you have this:
...--o--*--o--1 <-- master, origin/master \ \ \ o--F <-- origin/branch \ o--8--A <-- branch
Here I've written
F for commit
fedcba9, the one
origin/branch now points-to.
If you come across this later without realizing that the upstream guys rebased their
origin/branch), you might look at this and think that you must have written all three commits in the
o--8--A chain, because they're on your
branch and not on
origin/branch anymore. But the reason they're not on
origin/branch is that the upstream abandoned them in favor of the new copies. It's a bit hard to tell that those new copies are, in fact, copies, and that you, too, should abandon those commits.
1If branches grow in the "normal", "expected" way, it's really easy for your Git and their Git to figure out which commits your Git needs from them: your
origin/master tells you where you saw their
master last time, and now their
master points further down a longer chain. The commits you need are precisely those on their
master that come after the tip of your
If branches are shuffled around in less-typical ways, it's somewhat harder. In the most general case, they simply have to enumerate all their objects by hash IDs, until your Git tells them that they have reached one you already have. The specific details get further complicated by shallow clones.
It's not impossible to tell, and since Git version 2.0 or so, there are now built-in tools to let Git figure it out for you. (Specifically,
git merge-base --fork-point, which is invoked by
git rebase --fork-point, uses your reflog for
origin/branch to figure out that the
o--8 chain used to be on
origin/branch at one point. This only works for the time-period that those reflog entries are retained, but this defaults to at least 30 days, giving you a month to catch up. That's 30 days in your time-line: 30 days from the time you run
git fetch, regardless of how long ago the upstream did the rebase.)
What this really boils down to is that if you and your upstream agree, in advance, that some particular set of branch(es) get rebased, you can arrange to do whatever is required in your repository every time they do this. With a more typical development process, though, you won't expect them to rebase, and if they don't—if they never "abandon" a published commit that you have fetched—then there's nothing you need to recover from.