I have a few .resx files in my repository containing string translations for my app. This works well, except for merge conflicts when new strings have been added to the end of the file in separate git branches. KDiff3 doesn't play well with merging XML lists of pairs.
The resx file is basically a list of key/value-pairs with no particular ordering. To avoid merge conflicts, I therefore want to sort the pairs alphabetically prior to committing, and I have used the excellent SortRESX program to do this using a git filter:
git config --global filter.resx.clean SortRESX
git config --global filter.resx.smudge cat
However, if I check out an unsorted file it will be sorted immediately by the filter
This should not happen (I think!). I believe something else is actually happening.
The intent/idea of a
clean filter is that it is applied when the file is added to the index, and a smudge filter is applied when the file is extract from the index into the work-tree (this is why
git checkout must first write a file from a commit into the index, before copying it to the work-tree, when you use
git checkout <commit> -- <path>). Note that end of line / CRLF transformations are treated as a form of filter (done internally if possible, but done on a pipe from or to your actual user-supplied filter if needed).
(It's possible that there's some code I missed somewhere that does runs the
clean filter in some extra case. But I don't think so: this part of the Git source is fairly obvious.)
I believe what is happening is more subtle. When Git applies a smudge filter, it automatically marks the cache entry "dirty" in the index. (This code is considerably less obvious, so I could be wrong here.) Because of this marking, when Git goes to check the status of the file, it says: Hmm, this cache entry is marked dirty, I'd best run the
clean filter on it and find out for sure. So it runs your
clean filter, which sorts the key/value pairs, then compares the result to the underlying blob. These differ, so Git now declares the work-tree entry "truly dirty", even though the original, unsorted work-tree entry actually matches the current commit.
In other words, Git assumes that the equivalent of
git cat-file <hash-id> | smudge | clean produces the same bits as
git cat-file <hash-id>, and if it doesn't, you should commit the file—which is actually generally true when you are attempting to normalize line endings as stored in the repository. That doesn't mean that the checked-out copy is sorted; your
cat filter (which, incidentally, you can discard: a nonexistent filter means "leave this alone") did not sort the file, and the working tree copy is still unsorted. It's just that Git insists that it should become sorted.
What this means in the end is that the answer to:
How can I discard the changes made by the filter without committing?
is to simply ignore Git's complaints, and check out other commits anyway. You may have to use the
--force flag to do this though, which is unsettling at best (and at worst, can cause you to lose changes you intended to keep!). So there's a slightly better (ish) method: temporarily disable the "clean" filter (by editing
With the filter disabled (or replaced with
cat, which does the same thing only slower), Git will now, upon checking status, see that the "dirty" flag is set, and repeat its Hmm, I'd best run the
cleanfilter thing. This time the filter is a no-op, the resulting binary bits match the blob, and Git clears the dirty flag. You can now restore the filter at any point, because now the cache entry is no longer marked dirty, and Git will skip all this testing.
(It might be nice to have a way to get Git to try two comparisons before declaring a file "truly dirty": one using the actual, configured
clean filter, and then if that says "dirty", one more time using no filter. That would automatically decide that work-tree files based on an "uncleaned" in-repo blob, but which ultimately match that blob anyway, are in fact "not dirty". Of course this would mean you would not be encouraged to fix your line endings, but if this were a user-defined switch, you could set it for old repositories containing unclean objects, the same way you can set
merge.renormalize for such repositories.)