I've read and heard that C++11 supports Unicode. A few questions on that:
How well does the C++ standard library support unicode?
A quick scan through the library facilities that might provide Unicode support gives me this list:
I think all but the first one provide terrible support. I'll get back to it in more detail after a quick detour through your other questions.
std::stringdo what it should?
Yes. According to the C++ standard, this is what
std::string and its siblings should do:
The class template
basic_stringdescribes objects that can store a sequence consisting of a varying number of arbitrary char-like objects with the first element of the sequence at position zero.
std::string does that just fine. Does that provide any Unicode-specific functionality? No.
Should it? Probably not.
std::string is fine as a sequence of
char objects. That's useful; the only annoyance is that it is a very low-level view of text and standard C++ doesn't provide a higher-level one.
How do I use it?
Use it as a sequence of
char objects; pretending it is something else is bound to end in pain.
Where are potential problems?
All over the place? Let's see...
The strings library provides us
basic_string, which is merely a sequence of what the standard calls "char-like objects". I call them code units. If you want a high-level view of text, this is not what you are looking for. This is a view of text suitable for serialization/deserialization/storage.
It also provides some tools from the C library that can be used to bridge the gap between the narrow world and the Unicode world:
The localization library still believes that one of those "char-like objects" equals one "character". This is of course silly, and makes it impossible to get lots of things working properly beyond some small subset of Unicode like ASCII.
Consider, for example, what the standard calls "convenience interfaces" in the
template <class charT> bool isspace (charT c, const locale& loc); template <class charT> bool isprint (charT c, const locale& loc); template <class charT> bool iscntrl (charT c, const locale& loc); // ... template <class charT> charT toupper(charT c, const locale& loc); template <class charT> charT tolower(charT c, const locale& loc); // ...
How do you expect any of these functions to properly categorize, say, U+1F34C ʙᴀɴᴀɴᴀ, as in