I want to remove diacritic signs in some strings.
my $str1 = 'èîü';
my $str2 = $str1;
$str1 =~ tr/î/i/;
print "$str1\n"; # => i�iii�
$str2 =~ s/î/i/;
print "$str2\n"; # => èiü
When you don't have
use utf8;, but you are viewing the code with a utf8 text editor, you're not seeing it the way perl sees it. You think you have a single character in the left half of your
tr/// but because it's multiple bytes, perl sees it as multiple characters.
s///, since none of the characters are regexp operators, you're just doing a substring search. You're searching for a multi-character substring. And you find it, because the same thing that happened in your
s/// is also happening in your string literals: the characters you think are in there really aren't, but the multi-character sequence is.
tr/// on the other hand, multiple characters aren't treated as a sequence, they're treated as a set. Each character (byte) is handled separately when it is found. And that doesn't get you the results you want, because changing the individual bytes of a utf8 string is never what you want.
The fact that you can run simple ASCII-oriented substring search that knows nothing about utf8, and get the correct result on a utf8 string, is considered a good backward-compatibility feature of utf8, as opposed to other encodings like ucs2/utf16 or ucs4.