A.K A.K - 1 year ago 132
C++ Question

What is difference between CW2A(LPCWSTR)str) and CW2A(LPCWSTR)str, CP_UTF8)?

I am trying to convert few CStringW strings to CStringA strings. One of the strings (lets call it otherLangString) is in other language (Chinese, Arabic etc.). All the other strings had no issue getting converted when used like this :


But when used for the otherLangString, I was getting "?????"
So to fix that, I did this and it worked

CW2A(some_String, CP_UTF8);

Now in the code some all conversions looked like the 1st sample except one which looked like the 2nd sample.

For consistency I mixed above two and did this for all.

CW2A((LPCWSTR)some_String, CP_UTF8);

My questions is, What is the difference between following ?

- CW2A((LPCWSTR)some_String, CP_UTF8) and CW2A(some_String, CP_UTF8);
- CW2A((LPCWSTR)some_String) and CW2A(some_String, CP_UTF8);

Answer Source

CW2A is a typedef for CW2AEX<>, and it's c'tor is documented. The c'tor taking 2 arguments allows you to explicitly specify the code page to use for the conversion:

The code page used to perform the conversion. See the code page parameter discussion for the Windows SDK function MultiByteToWideChar for more details.

If you don't specify a code page, the current thread's ANSI code page is used for the conversion (you rarely want that). This is explained under ATL and MFC String Conversion Macros:

By default, the ATL conversion classes and macros will use the current thread's ANSI code page for the conversion. If you want to override that behavior for a specific conversion using macros based on the classes CA2WEX or CW2AEX, specify the code page as the second parameter to the constructor for the class.

In your case,


converts from UTF-16 to a narrow character string, using the thread's current ANSI code page. The result is only meaningful when interpreted using the same ANSI code page. To make matters worse, ANSI code page encoded strings cannot represent all Unicode characters.

The other piece of code

CW2A(some_String, CP_UTF8);

converts from UTF-16 to UTF-8. This is generally favorable, since the conversion is lossless and explicit. Both encodings can represent the same set of characters. The encoded string can be decoded by any reader capable of interpreting UTF-8.

Note: In general, you cannot directly use a UTF-8 encoded character string stored in a CStringA in Windows. It is safe to send the contents over a network, or write them to disk. But if you want to pass it to the Windows API (e.g. for display) you have to convert to UTF-16 first. The ANSI versions of the Windows API do not support UTF-8.