MD XF MD XF -3 years ago 95
C Question

Trying to read wide char gives EOF

I've got a text file,

foo.txt
, with these contents:

R⁸2


I had a large program reading it and doing things with each character, but it always received EOF when it hit the
. Here's the relevant portions of the code:

setlocale(LC_ALL,"");

FILE *in = fopen(argv[1],"r");

while (1) {
wint_t c = getwc(in);
printf("%d ",wctob(c));

if (c == -1)
printf("Error %d: %s\n",errno,strerror(errno));

if (c == WEOF)
return 0;
}


It prints
82 -1
(the ASCII codes for
R
and EOF). No matter where I have the
¹
in the file, it always reads as EOF. Edit, I added a check for
errno
and it gives this:

Error 84: Invalid or incomplete multibyte or wide character


However, ⁸ is Unicode U+2078 'SUPERSCRIPT EIGHT'. I wrote it to
foo.txt
via
cat
and copy-pasting from fileformat.info. A hexdump of
foo.txt
shows:

0000000: 52e2 81b8 32 R...2


What's the problem?

Answer Source

1. Check for WEOF instead of EOF

EOF is meant for single-byte characters. WEOF is for wide characters. When reading the start of a wide character with getwc, single-byte EOF can sometimes be returned.

In stdio.h:

#define EOF (-1)

In wchar.h:

#define WEOF (0xffffffffu)

2. Set the locale to one supporting Unicode

The default locale of a C program is C, also called POSIX, which is only meant for ASCII. Using setlocale, it is sometimes necessary to explicitly set the appropriate locales to codepages that support Unicode. C.UTF-8 is portable.

setlocale(LC_ALL,"C.UTF-8");
setlocale(LC_CTYPE,"C.UTF-8");

3. Use the proper type for wide characters

The return value of getwc isn't char, int or even wchar_t, it's wint_t. Make sure that your character variable c is of type wint_t, to avoid memory problems.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download