In the function mb_detect_encoding there is a parameter for strict mode.
In the first, most upvoted comment:
$str = 'áéóú'; // ISO-8859-1
mb_detect_encoding($str, 'UTF-8'); // 'UTF-8'
mb_detect_encoding($str, 'UTF-8', true); // false
I did not write it, I did not step through it with a debugger, this is my interpretation only.
It seems that the intention was for strict mode to check if the string as a whole was valid for the encoding, while non-strict mode would allow for a sub-sequence that could be part of a valid string. For example, if the string ended with what should be the first byte of a multi-byte character it would not match in strict mode but would still qualify as UTF-8 under non-strict mode.
However there seems to be a bug* where in non-strict mode only the first byte of the string is being checked in some circumstances.
0xf8 is not allowed anywhere in UTF-8. When placed at the start of a string
mb_detect_encoding() properly returns false for it regardless of which mode is used.
$str = "\xf8foo"; var_dump( mb_detect_encoding($str, 'UTF-8'), // bool(false) mb_detect_encoding($str, 'UTF-8', true) // bool(false) );
But as long as the leading byte may occur anywhere in a UTF-8 sequence, non-strict mode returns UTF-8.
$str = "foo\xf8"; var_dump( mb_detect_encoding($str, 'UTF-8'), // string(5) "UTF-8" mb_detect_encoding($str, 'UTF-8', true) // bool(false) );
So while your ISO-8859-1 string
'áéóú' is not valid UTF-8, the first byte
"\xe1" can occur in UTF-8 and
mb_detect_encoding() mistakenly returns the string as such.
*I've opened a report for this at https://bugs.php.net/bug.php?id=72933