kc2 kc2 - 1 month ago 5
Git Question

How to find out what changes on a branch after merges from master?

I forked

master
for a branch. From time to time I merge
master
into my branch for updates.

/--B--D--G--H--J--L--M--> dev (HEAD)
/ / /
-----A--C--E--F--I--K---> master


How can I show the changes only on my branch, excluding those from the merges?
I.e. show difference only on commits
B
,
D
,
H
,
L
,
M
.

git diff A
doesn't work because it includes merged changes from
master
.

BTW, I will appreciate if anyone knows a quick way to find
A
without keep scrolling down the log.

Answer

It's not clear to me precisely what you're looking for. But note that:

  • Commits store snapshots. This means that commit K has a complete source tree that is independent of whatever is in commits A, B, C, and so on. Likewise, commit J has a complete snapshot, independent of whatever is in A, or K, or any other commit.

    (The word "independent" here means that if you ask Git to retrieve commit J, it would not matter if you had somehow managed to alter commit K after making J. It's not actually possible to alter any commit, ever; git commit --amend seems to change a commit, but really does not.)

  • Using git diff on two specific commits, Git extracts each of the two snapshots, and compares those.

Simple diffs

Therefore, git diff K M will show you what is different between the current tip of master (commit K), and the current tip of dev (commit M).

You can also just spell this git diff master dev.

That may be what you want to see (again, it's not really clear to me). So answer 1: git diff master dev.

Multiple independent diffs

On the other hand, maybe what you want is to show one diff for commit B, one diff for commit D, one diff for commit H, one diff for commit L, and one diff for commit M. That is, you want to see each non-merge commit, one at a time, as compared to its (single) parent.

You could git diff A B, and git diff B D, and so on. But you can also just git show B, and git show D, and so on. This shows you the commit as a change, instead of as a snapshot.

You may be wondering how this is possible if commits store snapshots, rather than changes. (This trips a lot of people up since most other Version Control Systems actually do store changes.) The answer to this apparent contradiction is that git show looks up the same graph you drew.

Look again at commit B. What commit comes before it? That is, which commit is to its left, following lines leftwards? There's only one possible ancestor for commit B, and that's its single parent, commit A. So git show B:

  1. extracts the snapshot for commit A, then
  2. extracts the snapshot for commit B, then
  3. diffs those two snapshots.

Similarly, there's only one immediate ancestor (parent) for commit M, and that's commit L. So git show M:

  1. extracts the snapshot for L, then
  2. extracts the snapshot for M, then
  3. diffs those two snapshots.

If this is what you want, the interesting question becomes: how do you find the IDs of each commit in the B, D, H, L, and M sequence? The answer to this is a bit complicated, but the key command is git rev-list, which is essentially the same command as git log. What these commands (git log and git rev-list both) do is to walk the commit graph. That is, you pick some starting point—in this case, commit M, the tip of dev—and tell Git to walk backwards through the commits, looking at each commit's parents.

The problem is that when you hit a merge commit, such as commit J, Git walks back to all of its parents. You want to restrict Git to finding only the parent that was the tip of branch dev when you made the merge commit. Git has a flag for this, spelled --first-parent. This tells git rev-list to follow only the first parent of each merge commit.

We also want to skip the merges, so we can add --no-merges (this does not affect the walking-back process, it just limits the printed revision IDs to exclude the merges).

This leads to answer 2a: git rev-list --first-parent --no-merges ^A dev (we'll get to the "not A" part later).

Now, actually using this with git rev-list is kind of a pain, because now we have to take each commit ID and then run git show on it. There's a much easier way, though, because git rev-list and git log are essentially the same command, and git log -p shows each commit as a patch, doing much the same thing as git show.

This leads to answer 2b: git log -p --first-parent --no-merges ^A dev. We don't necessarily need the --no-merges here either; see below.

Merge commits and combined diffs

The one special thing that git show does, that git diff doesn't, is to handle the case of showing a merge commit, such as commit J. Commit J has two parents, namely commits H and K (incidentally, this implies that you made commit K before making commit J :-) ). If you run git diff H J, Git will extract the snapshots for H and J and compare them. If you run git diff K J, Git will extract the snapshots for K and J and compare them. But if you run git show J, Git will:

  1. extract the snapshot for H, then
  2. extract the snapshot for K, then
  3. extract the snapshot for J (the merge), and finally
  4. produce what Git calls a combined diff.

The combined diff from step 4 attempts to show, in a compact fashion, changes from both H and K to J. In compacting the changes, Git throws out any file where the version in either H or K matches the version in J.

That is, suppose file README.txt has some change from H to J. But suppose that README.txt is the same in K and J. In other words, when you did the git merge, you were picking up changes from K in order to make J, and there were no changes to README.txt from the other side of the merge. This means README.txt exactly matches one "incoming side", and hence the combined diff completely ignores the file.

This means combined diffs often show nothing at all, even though the merge picked up some change(s) in the new snapshot. To see those changes, you must make two diffs, not just one. You must make one diff from H to J, and another diff from K to J, rather than relying on the combined diff.

When using git log -p, you can also see combined diffs for merges, by adding -c or --cc to the options. But if you don't ask for this, what Git does is actually laughably simple: it just doesn't bother to show a diff at all.

So this leads to answer 2c: git log -p --first-parent ^A dev. All we did is drop the --no-merges: we will now see each merge's log message, but no diff.

What is this ^A thing?

This also ties into your other question:

BTW, I will appreciate if anyone knows a quick way to find A without keep scrolling down the log.

The answer to this is to make a symbolic name for commit A. Find its ID once and then choose a name, like dev or master (but don't use either those since those are in use!).

That's all dev and master are: they're just symbolic names for commits, plus the extra property that, as branch names, you can git checkout the symbolic name and wind up "on" the branch. You can give A a branch-name too. You will need to make sure you do not git checkout this branch and make commits on it, because if you do that, you'll grow a new branch, rather than just leaving the branch-name pointing to commit A.

Alternatively, you can make a tag-name pointing to commit A. That's almost exactly the same as a branch name. The two differences are:

  1. It's a tag name: you can't check it out as a branch, and therefore can't change it accidentally.
  2. It's a tag name: if you git push --tags you'll send it upstream as a tag, and then everyone else will have it too.

In this case point 1 is in your favor and point 2 is probably not, so it's up to you whether the advantage (can't accidentally change it) is worth the risk (might accidentally publish it).

If you have the ID of A, then, you can:

$ git tag A <id-of-commit-A>

and now you have the name A. (You can later git tag -d A to delete it, although if you accidentally published it, you'll probably keep getting it back from your upstream.)

Getting back to the question of the ^A string in the git log commands, all ^A does is tell git rev-list (and therefore git log) to stop walking through the commit graph upon reaching commit A. The commit is also not shown (for git rev-list, not printed; for git log, not shown in the log output). The prefix ^ symbol is short for "not", i.e., "get me all commits reachable from dev, but not reachable from A". Adding --first-parent makes Git traverse only the first parent of each merge, so that we do not walk into commits merged from master.

(The ^A dev syntax can also be spelled A..dev. Note that this works with both git log and git rev-list, but means something very different for git diff.)

Comments