Erik Aigner Erik Aigner - 2 months ago 18
Git Question

Git: Blame Statistics

How can I "abuse" blame (or some better suited function, and/or in conjunction with shell commands) to give me a statistic of how much lines (of code) are currently in the repository originating from each committer?

Example Output:

Committer 1: 8046 Lines
Committer 2: 4378 Lines


PS: I'm running OSX

Answer

Update

git ls-tree -r -z --name-only HEAD -- */*.c | xargs -0 -n1 git blame \
--line-porcelain HEAD |grep  "^author "|sort|uniq -c|sort -nr

I updated some things on the way.

for the lazy you can also put this into it's own command:

#!/bin/bash

# save as i.e.: git-authors and set the executable flag
git ls-tree -r -z --name-only HEAD -- $1 | xargs -0 -n1 git blame \
 --line-porcelain HEAD |grep  "^author "|sort|uniq -c|sort -nr

store this somewhere in your path or modify your path and use it like

  • git authors '*/*.c' # look for all files recursively ending in .c
  • git authors '*/*.[ch]' # look for all files recursively ending in .c or .h
  • git authors 'Makefile' # just count lines of authors in the Makefile

Original Answer

While the accepted answer does the job it's very slow.

$ git ls-tree --name-only -z -r HEAD|egrep -z -Z -E '\.(cc|h|cpp|hpp|c|txt)$' \
  |xargs -0 -n1 git blame --line-porcelain|grep "^author "|sort|uniq -c|sort -nr

is almost instantaneous.

To get a list of files currently tracked you can use

git ls-tree --name-only -r HEAD

This solution avoids calling file to determine the filetype and uses grep to match the wanted extension for performance reasons. If all files should be included, just remove this from the line.

grep -E '\.(cc|h|cpp|hpp|c)$' # for C/C++ files
grep -E '\.py$'               # for Python files

if the files can contain spaces, which are bad for shells you can use:

git ls-tree -z --name-only -r HEAD | egrep -Z -z '\.py'|xargs -0 ... # passes newlines as '\0'

Give a list of files (through a pipe) one can use xargs to call a command and distribute the arguments. Commands that allow multiple files to be processed obmit the -n1. In this case we call git blame --line-porcelain and for every call we use exactly 1 argument.

xargs -n1 git blame --line-porcelain

We then filter the output for occurences of "author " sort the list and count duplicate lines by:

grep "^author "|sort|uniq -c|sort -nr

Note

Other answers actually filter out lines that contain only whitespaces.

grep -Pzo "author [^\n]*\n([^\n]*\n){10}[\w]*[^\w]"|grep "author "

The command above will print authors of lines containing at least one non-whitespace character. You can also use match \w*[^\w#] which will also exclude lines where the first non-whitespace character isn't a # (comment in many scripting languages).