René Nyffenegger René Nyffenegger - 1 year ago 50
Perl Question

In what encoding does readpipe return the result of an executed command?

Here's a simple perl script that is supposed to write a utf-8 encoded file:

use warnings;
use strict;

open (my $out, '>:encoding(utf-8)', 'tree.out') or die;

print $out readpipe ('tree ~');

close $out;

I have expected readpipe to return a utf-8 encoded string since
is set to
. However, looking at
(while making sure the editor recognizes it a as utf-8 encoded) shows me all garbled text.

If I change the
in the open statement to
, the script creates a utf-8 file with the expected

This is all a bit strange to me. What is the explanation for this behavior?

tjd tjd
Answer Source

readpipe is returning to perl a string of undecoded bytes. We know that that string is UTF-8 encoded, but you've not told Perl.

The IO layer on your output handle is taking that string, assuming it is Unicode code-points and re-encoding them as UTF-8 bytes.

The reason that the latin-1 IO layer appears to be functioning correctly is that it is writing out each undecoded byte unmolested because the 1st 256 unicode code-points correspond nicely with latin-1.

The proper thing to do would be to decode the byte-string returned by readpipe into a code-point-string, before feeding it to an IO-layer. The statement use open ':utf8', as mentioned by Borodin, should be a viable solution as readpipe is specifically mentioned in the open manual page.