Hello World Hello World - 3 months ago 17
Perl Question

Why do I get garbled output when I decode some HTML entities but not others?

In Perl, I am trying to decode strings which contain numeric HTML entities using HTML::Entities. Some entities work, while "newer" entities don't. For example:

decode_entities('®'); # returns ® as expected
decode_entities('Ω'); # returns Ω instead of Ω
decode_entities('★'); # returns ★ instead of ★

Is there a way to decode these "newer" HTML entities in Perl? In PHP, the
function seems to decode all of these entities without any problem.


The decoding works fine. It's how you're outputting them that's wrong. For example, you may have sent the strings to a terminal without encoding them for that terminal first. This is achieved through the open pragma in the following program:

$ perl -e'
    use open ":std", ":encoding(UTF-8)";
    use HTML::Entities qw( decode_entities );
    CORE::say decode_entities($_)
       for "®", "Ω", "★";