René Nyffenegger René Nyffenegger - 1 year ago 37
Perl Question

Are there any gotchas with open(my $f, '<:encoding(UTF-8)', $n)

I am having a problem that I am unable to reproduce in a manner suitable for Stackoverflow although it's reproducable in my production environment.

The problem occors in a Perl script that, among others, iterates over a file that looks like so:

abc-4-9|free text, possibly containing non-ascii characters|
cde-3-8|hällo wörld|
# comment

xyz-9-1|and so on|
qrs-2-8|and so forth|

I can verify the correctness of the file with this Perl script:

use warnings;
use strict;

open (my $f, '<:encoding(UTF-8)', 'c:\path\to\file') or die "$!";

while (my $s = <$f>) {
next unless $s;
next if $s =~ m/^#/;
$s =~ m!(\w+)-(\d+)-(\d+)\|([^|]*)\|! or die "\n>$s<\n didn't match on line $.";

print "Ok\n";
close $f;

When I run this script, it won't die on line 10 and consequently print

Now, I use essentially the same construct in a huge Perl script (hence irreproducable for Stackoverflow) and it will die on line 2199 of the input file.

If I change the first line (which is completely unrelated to line 2199) from something like

www-1-1|A line with some words|



the script will process line 2199 (but fail later).

Interestingly, this behaviour was introduced when I changed

open (my $f, '<', 'c:\path\to\file') or die "$!";


open (my $f, '<:encoding(UTF-8)', 'c:\path\to\file') or die "$!";

Without the
directive, the script does not fail. Of course, I need the encoding directive since the file contains non-ascii characters.

BTW, the same script runs without problems on Linux.

On Windows, where it fails, I use Strawberry Perl 5.24

Answer Source

I do not have a full and correct explanation of why this is necessary, but you can try opening the file with


This may be related to my question "Why is CRLF set for the unix layer on Windows?" which I noticed when I was trying to figure out stuff which I ended up never figuring out.