Trevin Corkery Trevin Corkery - 1 month ago 17
C++ Question

C++ - Converting wchar_t to network-byte and back

The main reason is because I am sending Unicode data (bytes, not characters) over Sockets, and I wanted to make sure endianness matches up because wchar_t is UTF16.

Also the receiving program is my other one, so I will know that it is UTF16 and be able to react accordingly.

Here is my current algorithm that kind of works, but has a weird result. (This is in the same application because I wanted to learn how to convert it before sending it off)

case WM_CREATE: {


//Convert String to NetworkByte
wchar_t Data[] = L"This is a string";
char* DataA = (char*)Data;
unsigned short uData = htons((unsigned int)DataA);

//Convert String to HostByte
unsigned short hData = ntohs(uData);
DataA = (char*)&hData;
wchar_t* DataW = (wchar_t*)DataA;
MessageBeep(0);


break;
}


Result:

쳌쳌쳌쳌쳌곭쳌쳌쳌쳌쳌ē쳌쳌쳌쳌This is a string

Answer

UTF8 and UTF16 store text in a completely different way. Casting wchar_t* to char* is meaningless, it's the same as casting float to char*.

Use WideCharToMultiByte to convert UTF16 to UTF8 to send to network function.

When receiving UTF8 from network functions, use MultiByteToWideChar to convert back to UTF16 so that it can be used in Windows functions.

Example:

#include <iostream>
#include <string>
#include <windows.h>

std::string get_utf8(const std::wstring &wstr)
{
    if (wstr.empty()) return std::string();
    int sz = WideCharToMultiByte(CP_UTF8, 0, &wstr[0], -1, 0, 0, 0, 0);
    std::string res(sz, 0);
    WideCharToMultiByte(CP_UTF8, 0, &wstr[0], -1, &res[0], sz, 0, 0);
    return res;
}

std::wstring get_utf16(const std::string &str)
{
    if (str.empty()) return std::wstring();
    int sz = MultiByteToWideChar(CP_UTF8, 0, &str[0], -1, 0, 0);
    std::wstring res(sz, 0);
    MultiByteToWideChar(CP_UTF8, 0, &str[0], -1, &res[0], sz);
    return res;
}

int main()
{
    std::wstring greek = L"ελληνικά";

    std::string utf8 = get_utf8(greek);
    //use utf8.data() for network function...

    //convert utf8 back to utf16 so it can be displayed in Windows:
    std::wstring utf16 = get_utf16(utf8);
    MessageBoxW(0, utf16.c_str(), 0, 0);

    return 0;
}
Comments