MarkE MarkE - 1 month ago 7
Git Question

Why do I have duplicate commits after running a git filter?

I ran the following command on my git repository to force all text files to use unix line endings.

git filter-branch --force --tree-filter 'git ls-files | xargs file | sed -n -e "/.*: .*text.*/s/\(.*\): .*/\1/p" | xargs dos2unix' --tag-name-filter cat -- --all


According to the git log this resulted in a duplicate commit for every commit in the repository, author date and comment are same, hash as expected is new. Is that what I should have expected? I thought it would replace the existing commits with different versions of the files.

Is there a better way to do the conversion that wouldn't result in all the extraneous commits?

Answer

Git commits are immutable, so any time you want to change anything about a given commit, you actually have to create a new commit instead. This includes file content, author date/time, or parent commit. (Thus, if you change the content in one commit, you must create new commits for all those that follow.)

So yes, this is what you should expect, and no, there isn't a way to do the conversion that doesn't produce new commits. This is true for any commands that rewrite history, which includes rebase as well as filter-branch.

Read more about how this works in the Git Internals - Git Objects section in the Pro Git book.