Josh de Kock Josh de Kock - 2 months ago 8
Git Question

How to get all authors for current state of git?

I've been trying to find all the authors of a git project, so that I can ask about relicensing their commits. I figured there'd be no point in contacting all the authors, as there may have been some who have had code in the codebase, but it was removed. So I wanted to contact only the authors with commits which are visible in the current HEAD.

I was told that git log had this capability, but I couldn't find anything on it except for something like:

git log --format='%an <%ae>'


Which does sort of what I'd like to achieve except it doesn't exclude authors without code in the current codebase.

How can I achieve this?

Answer

IANAL, but as for the relicensing I am not so sure that it is enough to have only the permission of the authors who have any code in the current project. After all their contributions / commits somehow lead to the current state of the project.

That aside you may want to take a look at git blame. It shows what line of a file was introduced in which commit by which author. This should get you closer to the solution of your problem. Maybe some additional post processing with awk ... | sort | uniq can do the rest.

However, git blame only shows information for a single file, so you would have to repeat that for all files in the repository.

In the root directory of the Git repository, you could use a shell command like this on Linux systems:

find ./ -name '*.cpp' -print0 | xargs -0 -i git blame --show-email {} | awk ' { print $3 } ' | sort | uniq

This searches for C++ source files (extension *.cpp) with find and performs a git blame on all of those files. The option --show-email of git blame shows e-mail addresses instead of names, which are easier to filter for, because names can consist of several words, while an address is usually just one. awk then gets only the third column of the output, which is the mail address. (First is the short commit hash, second one is the file name.) Finally, sort | uniq is used to get rid of duplicates, showing each address only once.

(Untested, but it may point you in the right direction.)

If you just want every author who ever comitted anything to the repository, just use

git log --format='%an <%ae>' | sort | uniq

instead.