In perl, I am working with the following utf-8 text:
my $string = 'a 3.9 kΩ resistor and a 5 µF capacitor';
decode_entities('a 3.9 kΩ resistor and a 5 µF capacitor');
a 3.9 kΩ resistor and a 5 ÂµF capacitor
You are using the Encode CPAN library. If that is true, you can try this...
my $string = "..."; $string = decode_entities(decode('utf-8', $string));
This may seem illogical. If Perl is natively UTF-8 itself, why should you need to decode a UTF-8 string? It is simply another way of telling Perl that you have a UTF-8 value that it needs to interpret as natively UTF-8.
The corruption you are seeing is when a UTF-8 value doesn't have the rights bytes recognized (it shows "0xC1 0xAF" when Dumpered; after the above change, it ought to show "0x1503", or some similar concat'ed bytes) .
There are a ton of settings that can affect this in perl. The above is most likely the right combination of changes that you need for your given settings. Otherwise, some variation (swap encode with decode('latin1', ...), etc.) of the above should solve the problem.