Svetsi Svetsi -4 years ago 191
C++ Question

Codepoint from Unicode Character?

This question has been asked before but it's solution is dependent on the Microsoft Foundation Class which I don't want to rely on. Basically what I wan to do is convert a Unicode character into a it's equivalent codepoint.

The below was the solution using MFC. Is there a way of doing this without using afxwin.h ?

#include <afxwin.h>

#include <iostream>

int main() {
using namespace std;

TCHAR myString[50] = _T("عربى");
int stringLength = _tcslen(myString); // <----- edit here

for(int i=0;i<stringLength;i++)
unsigned int number =myString[i];


Answer Source


If your compiler supports it, the easiest way to do this is probably to write your constant string as U"عربى". This gives you an array of char32_t characters whose code points are just their value converted with static_cast<uint32_t>(). To print them in standard format, just prepend U+ and print the hex value.

Try this on a C++14 compiler (I recommend saving the source file as utf-8).

#include <cstdlib>
#include <iomanip>
#include <iostream>

using std::cout;

int main()
  constexpr char32_t codepoints[] = U"عربى";
  constexpr size_t n = sizeof(codepoints)/sizeof(char32_t);

  cout.setf( cout.hex, cout.basefield );     // Output in hex
  cout.setf( cout.right, cout.adjustfield ); // Prepending
  cout.fill('0');                            // leading zeroes
  // Fixed: Don’t print the terminating U'\0'.
  for ( size_t i = 0; i < n && codepoints[i]; ++i )
    cout << "U+" << std::setw(4) << (unsigned long)codepoints[i] << std::endl;

  return EXIT_SUCCESS;


The C++ STL has <codecvt> now, which can convert from utf-8 or utf-16 to ucs-32. Example code (from

#include <fstream>
#include <iostream>
#include <string>
#include <locale>
#include <codecvt>

void prepare_file()
  // UTF-16le data (if host system is little-endian)
  char16_t utf16le[4] ={0x007a, // latin small letter 'z' U+007a
                        0x6c34, // CJK ideograph "water"  U+6c34
                        0xd834, 0xdd0b}; // musical sign segno U+1d10b
  // store in a file
  std::ofstream fout("text.txt");
  fout.write( reinterpret_cast<char*>(utf16le), sizeof utf16le);

int main() 
  prepare_file(); // open as a byte stream
  std::wifstream fin("text.txt", std::ios::binary); 
  // apply facet
  fin.imbue(std::locale(fin.getloc(), new std::codecvt_utf16<wchar_t, 0x10ffff, std::little_endian>));

  for (wchar_t c; fin.get(c); )
    std::cout << std::showbase << std::hex << c << '\n';

C11 and C++11 also have functions to convert between multibyte utf-8 and utf-16 and wide character strings (from here: The mbstowcs() function might be relevant, too.

#include <stdio.h>
#include <locale.h>
#include <string.h>
#include <uchar.h>
#include <assert.h>   

mbstate_t state;

int main(void)
  setlocale(LC_ALL, "en_US.utf8");
  char *str = u8"z\u00df\u6c34\U0001F34C"; // or u8"zß水
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download