René Nyffenegger René Nyffenegger - 7 months ago 12
Perl Question

In what encoding does readpipe return the result of an executed command?

Here's a simple perl script that is supposed to write a utf-8 encoded file:

use warnings;
use strict;

open (my $out, '>:encoding(utf-8)', 'tree.out') or die;

print $out readpipe ('tree ~');

close $out;


I have expected readpipe to return a utf-8 encoded string since
LANG
is set to
en_US.UTF-8
. However, looking at
tree.out
(while making sure the editor recognizes it a as utf-8 encoded) shows me all garbled text.

If I change the
>:encoding(utf-8)
in the open statement to
>:encoding(latin-1)
, the script creates a utf-8 file with the expected
text.

This is all a bit strange to me. What is the explanation for this behavior?

tjd tjd
Answer

readpipe is returning to perl a string of undecoded bytes. We know that that string is UTF-8 encoded, but you've not told Perl.

The IO layer on your output handle is taking that string, assuming it is Unicode code-points and re-encoding them as UTF-8 bytes.

The reason that the latin-1 IO layer appears to be functioning correctly is that it is writing out each undecoded byte unmolested because the 1st 256 unicode code-points correspond nicely with latin-1.

The proper thing to do would be to decode the byte-string returned by readpipe into a code-point-string, before feeding it to an IO-layer. The statement use open ':utf8', as mentioned by Borodin, should be a viable solution as readpipe is specifically mentioned in the open manual page.

Comments