Sk8erPeter Sk8erPeter - 3 months ago 151
PowerShell Question

Git Shell in Windows: patch's default character encoding is UCS-2 Little Endian - how to change this to ANSI or UTF-8 without BOM?

When creating a diff patch with Git Shell in Windows (when using GitHub for Windows), the character encoding of the patch will be UCS-2 Little Endian according to Notepad++ (see the screenshots below).

How can I change this behavior, and force git to create patches with ANSI or UTF-8 without BOM character encoding?

It causes a problem because UCS-2 Little Endian encoded patches can not be applied, I have to manually convert it to ANSI. If I don't, I get "fatal: unrecognized input" error.

Creating git patch

Notepad++ screenshot of the character encoding




Since then, I also realized that I have to manually convert the EOL from Windows format (
\r\n
) to UNIX (
\n
) in Notepad++ (Edit > EOL Conversion > UNIX). If I don't do this, I get "trailing whitespace" error (even if all the whitespaces are trimmed: "TextFX" > "TextFX Edit" > "Trim Trailing Spaces").

So, the steps I need to do for the patch to be applied:


  1. create patch (here is the result)

  2. convert character encoding to ANSI

  3. EOL conversion to UNIX format

  4. apply patch



Please, take a look at this screenshot:

Applying a patch in Windows Powershell with Git is problematic

Answer

I'm not a Windows user, so take my answer with a grain of salt. According to the Windows PowerShell Cookbook, PowerShell preprocesses the output of git diff, splitting it in lines. Documentation of the Out-File Cmdlet suggests, that > is the same as | Out-File without parameters. We also find this comment in the PowerShell documentation:

The results of using the Out-File cmdlet may not be what you expect if you are used to traditional output redirection. To understand its behavior, you must be aware of the context in which the Out-File cmdlet operates.

By default, the Out-File cmdlet creates a Unicode file. This is the best default in the long run, but it means that tools that expect ASCII files will not work correctly with the default output format. You can change the default output format to ASCII by using the Encoding parameter:

[...]

Out-file formats file contents to look like console output. This causes the output to be truncated just as it is in a console window in most circumstances. [...]

To get output that does not force line wraps to match the screen width, you can use the Width parameter to specify line width.

So, apparently it is not Git which chooses the character encoding, but Out-File. This suggests a) that PowerShell redirection really should only be used for text and b) that

| Out-File -encoding ASCII -Width 2147483647 my.patch

will avoid the encoding problems. However, this still does not solve the problem with Windows vs. Unix line-endings . There are Cmdlets (see the PowerShell Community Extensions) to do conversion of line-endings.

However, all this recoding does not increase my confidence in a patch (which has no encoding itself, but is just a string of bytes). The aforementioned Cookbook contains a script Invoke-BinaryProcess, which can be used redirect the output of a command unmodified.

To sidestep this whole issue, an alternative would be to use git format-patch instead of git diff. format-patch writes directly to a file (and not to stdout), so its output is not recoded. However, it can only create patches from commits, not arbitrary diffs.

format-patch takes a commit range (e.g. master^10..master^5) or a single commit (e.g. X, meaning X..HEAD) and creates patch files of the form NNNN-SUBJECT.patch, where NNNN is an increasing 4-digit number and subject is the (mangled) subject of the patch. An output directory can be specified with -o.

Comments