Nulik Nulik - 3 months ago 17
C Question

how do I print unicode character in C encoded with UTF8?

I am trying to print magnifying glass (http://www.fileformat.info/info/unicode/char/1f50e/index.htm), and I get this error:

[niko@dev1 ncurses]$ gcc -o utf8 -std=c99 $(ncursesw5-config --cflags --libs) utf8.c
utf8.c: In function ‘main’:
utf8.c:12:10: error: \ud83d is not a valid universal character
printw("\ud83ddd0e\n"); // escaped Unicode
^
[niko@dev1 ncurses]$ cat utf8.c
#include <locale.h>
#include <curses.h>
#include <stdlib.h>


int main (int argc, char *argv[])
{
setlocale(LC_ALL, "");

initscr();

printw("\ud83ddd0e\n"); // escaped Unicode

getch();
endwin();

return EXIT_SUCCESS;
}


What is the problem here? For, example, if I have a decimal number of encoding, which for magnifying glass is 55357 , how would I print it in printf to ncurses screen? (without using wchar_t because it wastes a lot of memory)

Answer

The information on fileformat.info is wrong. The escapes on the page are \ud83d\udd0e. This is an UTF-16 surrogate pair as used on Java, but it does not work on C, as GCC seems to require that one \u escape represent one Unicode codepoint, which half of the surrogate escape is not.

You should instead use \U (uppercase) with 8 hexadecimal digits, so U+1F50E becomes \U0001F50E. This escape prints properly printf, but for some reason, even after reading this article.


P.S: if instead of magnifying glass you see something like ~_~T~N, make sure that you've called the setlocale and actually linked against -lncursesw, failure to do either will mean that garbage will be printed instead.

Comments