I understand that Linux filesystem stores file names as byte sequences which is meant to be Unicode-encoding-independent.
But, encodings other than UTF-8 or Enhanced UTF-8 may very well use 0 byte as part of a multibyte representation of a Unicode character that can appear in a file name. And everywhere in Linux filesystem C code you terminate strings with 0 byte. So how does Linux filesystem support Unicode? Does it assume all applications that create filenames use UTF-8 only? But that is not true, is it?
Similarly, the shells (such as bash) use
There is no encoding of filenames in Linux (in ext family of filesystems at any rate). Filenames are sequences of bytes, not characters. It is up to application programs to interpret these bytes as UTF-8 or anything else. The filesystem doesn't care.
POSIX stipulates that the shell obeys the locale environment vsriables such as LC_CTYPE when performing pattern matches. Thus, pattern-matching code that just compares bytes regardless of the encoding would not be compatible with your hypothetical encoding, or with any stateful encoding. But this doesn't seem to matter much as such encodings are not commonly supported by existing locales. UTF-8 on the other hand seems to be supported well: in my experiments
bash correctly matches the
? character with a single Unicode character, rather than a single byte, in the filename (given an UTF-8 locale) as prescribed by POSIX.