Jake88 Jake88 - 3 months ago 16
Git Question

Why is my git smudge filter slow?

I have the following filters in git (I've expanded them onto lines for a bit more readability):

git config --global filter.lastcommit.smudge
'IFS="";
lastcommit=`git log -1 --format="%H" -- %f`;
filename=$(basename %f);
while read -r;
do
line="${REPLY//\$Revision\$/\$Revision: $lastcommit\$}";
line="${line//\$RCSfile\$/\$RCSfile: $filename\$}";
echo $line;
done'

git config --global filter.dater.smudge
'IFS=""; myDate=`git log --pretty=format:"%cI" -1 --format="%H" -- %f`;
while read -r;
do line="${REPLY//\$Date\$/\$Date: $myDate\$}";
echo $line;
done'


When I checkout a branch, the filters are run on approximately ~260+ files within the three directories with a .gitattributes file, but it can take up to a few seconds to perform filtering on each one:

Checking out files: 100% (6659/6659), done.
Branch ABC set up to track remote branch ABC from origin.
Switched to a new branch 'ABC'

real 4m42.334s
user 4m15.607s
sys 0m29.788s


Is there a way to improve the performance?

SOLUTION: I switched to using sed after reading @torek answer. Filters are now:

git config --global filter.lastcommit.smudge
'lastcommit=`git log -1 --format="%H" -- %f`;
filename=$(basename %f);
sed -e "s/\\\$Revision\\\$/\$Revision: $lastcommit\$/"
-e "s/\\\$RCSfile\\\$/\$RCSfile: $filename\$/";'

git config --global filter.dater.smudge
'myDate=`git log --pretty=format:"%cI" -1 -- %f`;
sed -e "s/\\\$Date\\\$/\$Date: $myDate\$/";'


And the gains are very impressive:

~/panos > time git checkout ABC
Checking out files: 100% (6659/6659), done.
Branch ABC set up to track remote branch ABC from origin.
Switched to a new branch 'ABC'

real 0m12.956s
user 0m10.417s
sys 0m2.452s


I was under the impression that a fork to sed would have been more costly than using pure bash. Turns out sed is much more efficient for text replacement. I learned something today.

Answer

Note: I've never used clean and smudge filters but the concept seems simple enough.

You're doing the smudge operation entirely in bash code, using read -r. This is highly inefficient (for good reasons that don't apply to your case) and is likely the source of all the slowness. Using sed instead should speed things up enormously.

The -e s/regexp/replacement/ arguments to sed are basically the same as the substitutions here (you need only protect the final $ from regexp matching) so it should be quite straightforward to switch.

Comments