FlKo FlKo - 9 months ago 25
C++ Question

Where to put std::wstring_convert<std::codecvt_utf8<wchar_t>>?

I am planning a new C++11 Win32/64 project with C++Builder 10.1 (Clang 3.3) and thinking about implementing it in the most portable way when it comes to the core functions, so I'd like to use UTF-8 for the

std::string
encoding (and also, because it's the default encoding for SQLiteCpp, the SQLite C++ wrapper I intend to use).

For interacting with the Win-API I decided to use the
.to_bytes()
and
.from_bytes()
functions from
<codecvt>
's and
<locale>
's
std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>>
.

So, now I'd like to know, what are the best practices where to place the converter object.

Should I give it it's own unit and namespace, e.g.

.h:

...
#include <codecvt>
#include <locale>

namespace cnv
{
extern std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>> wcu8;
}
...


.cpp:

...
namespace cnv
{
std::wstring_convert<std::codecvt_utf8<wchar_t>> wcu8;
}
...


and include it everywhere to use
cnv::wcu8.to_bytes(xyz)
where needed?

Or is it better to create an instance within each function implementation where I need to convert between encodings?

Answer Source

I wouldn't store the std::wstring_convert in a global variable because that's not thread-safe and doesn't buy you much. There might be a performance hit with instantiating std::wstring_convert everytime you need it, but that should not be your primary concern at the beginning (premature optimization).

So I'd just wrap that thing into functions:

std::wstring utf8_to_wstr( const std::string& utf8 ) {
    std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>> wcu8;
    return wcu8.from_bytes( utf8 );
}

std::string wstr_to_utf8( const std::wstring& utf16 ) {
    std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>> wcu8;
    return wcu8.to_bytes( utf16 );
}

You have to catch std::range_error exception somewhere. It can be thrown by std::wstring_convert if the conversion fails for some reason (invalid code points, etc.).

If you hit performance bottlenecks regarding string conversions later, you can still instantiate std::wstring_convert directly at critical points in your code, e. g. outside of a long running loop that converts many strings.