Virus721 Virus721 - 3 months ago 26
C Question

C++ / wcout / UTF-8

I'm reading a UTF-8 encoded unicode text file, and outputting it into the console, but the displayed characters are not the same as in the text editor i used to create the file. Here is my code :

#define UNICODE

#include <windows.h>
#include <iostream>
#include <fstream>
#include <string>

#include "pugixml.hpp"

using std::ifstream;
using std::ios;
using std::string;
using std::wstring;

int main( int argc, char * argv[] )
{
ifstream oFile;

try
{
string sContent;

oFile.open ( "../config-sample.xml", ios::in );

if( oFile.is_open() )
{
wchar_t wsBuffer[128];

while( oFile.good() )
{
oFile >> sContent;
mbstowcs( wsBuffer, sContent.c_str(), sizeof( wsBuffer ) );
//wprintf( wsBuffer );// Same result as wcout.
wcout << wsBuffer;
}

Sleep(100000);
}
else
{
throw L"Failed to open file";
}
}
catch( const wchar_t * pwsMsg )
{
::MessageBox( NULL, pwsMsg, L"Error", MB_OK | MB_TOPMOST | MB_SETFOREGROUND );
}

if( oFile.is_open() )
{
oFile.close();
}

return 0;
}


There must be something i don't get about encoding.

Answer

The problem is that a mbstowcs doesn't actually use UTF-8. It uses an older style of "multibyte codepoints", which is not compatible with UTF-8 (although technically is is possible [I believe] to define a UTF-8 codepage, there is no such thing in Windows).

If you want to convert UTF-8 to UTF-16, you can use MultiByteToWideChar, with a codepage of CP_UTF8.