The signedness of char is not standardized. Hence there are
c = getchar();
if (c != EOF) islower((unsigned char) c);
warning: conversion to ‘char’ from ‘unsigned char’ may change the sign of the result
it would be legitimate to defineas
ifis defined as
charmust be defined as
wint_tdue to the parameter promotion.
int main (void)
/* 11111111 */
char c = 'ÿ';
if (islower(c)) return 0;
wchar_t wc = L'ÿ';
if (iswlower(wc)) return 0;
Why there are no unsigned wchar_t and signed wchar_t types?
Because C's wide-character handling facilities were defined such that they are not needed.
In more detail,
The signedness of char is not standardized.
To be precise, "The implementation shall define char to have the same range, representation, and behavior as either signed char or unsigned char." (C2011, 6.2.5/15)
Hence there are
"Hence" implies causation, which would be hard to argue clearly, but certainly
signed char and
unsigned char are more appropriate when you want to handle numbers, as opposed to characters. In particular, note that whereas the standard classifies
signed char, and
unsigned char all as character types, it classifies only the latter two as integer types.
Therefore functions which work with single character must use the argument type which can hold both signed char and unsigned char
No, not at all. Standard library functions that work with individual characters could easily be defined in terms of type
char, regardless of whether that type is signed, because the library implementation does know its signedness. If that were a problem then it would apply equally to the string functions, too --
char would be useless.
Your example of
getchar() is non-apposite. It returns
int rather than a character type because it needs to be able to return an error indicator that does not correspond to any character. Moreover, the code you present does not correspond to the accompanying warning message: it contains a conversion from
unsigned char, but no conversion from
Some other character-handling functions accept
int parameters or return values of type
int both for compatibility with
getchar() and other stdio functions, and for historic reasons. In days of yore, you couldn't actually pass a
char at all -- it would always be promoted to
int, and that is what the functions would (and must) accept. One cannot later change the argument type, evolution of the language notwithstanding.
Further, the ISO C90 standard, where
wchar_twas introduced, does not say anything specific about the representation of
C90 isn't really relevant any longer, but no doubt it says something very similar to C2011 (7.19/2), which describes
an integer type whose range of values can represent distinct codes for all members of the largest extended character set specified among the supported locales [...].
Your quotations from the glibc reference are non-authoritative, except possibly for glibc only. They appear in any case to be commentary, not specification, and its unclear why you raise them. Certainly, though, at least the first is correct. Referring to the standard, if all the members of the largest extended character set specified among the locales supported by a given implementation could fit in a
char then that implementation could define
char. Such implementations used to be much more common than they are today.
You ask several questions:
Private communication reveals that an implementation is allowed to support wide characters with >=0 value only (independently of signedness of
wchar_t). Anybody knows what this means?
I think it means that whoever communicated that to you doesn't know what they are talking about, or perhaps that what they are talking about is something different than the requirements placed by the C standard. You will find that in practice, character sets are defined with only non-negative character codes, but that is not a constraint placed by the C standard.
Does thin mean that when
wchar_tis 16-bit type (for example), we can only use 15 bits to store the value of wide character?
The C standard does not say or imply that. You can store the value of any supported character in a
wchar_t. In particular, if an implementation supports a character set containing character codes exceeding 32767, then you can store those in a
In other words, is it true that a sign-extended wchar_t is a valid value?
The C standard does not say or imply that. It does not even say whether
wchar_t is a signed type (if not, then sign extension is meaningless for it). If it is a signed type, then there is no guarantee about whether sign-extending a value representing a character in some supported character set (which value could, in principle, be negative) will produce a value that also represents a character in that character set, or in any other supported character set. The same is true of adding 1 to a
Also, private communication reveals that the standard requires that any valid value of
wchar_tmust representable by
wint_t. Is it true?
It depends what you mean by "valid". The standard says that
is an integer type unchanged by default argument promotions that can hold any value corresponding to members of the extended character set, as well as at least one value that does not correspond to any member of the extended character set.
wchar_t must be able to hold any value corresponding to a member of the extended character set, in any supported locale.
wint_t must be able to hold all of those values, too. It may be, however, that
wchar_t is capable of representing values that do not correspond to any character in any supported character set. Such values are valid in the sense that the type can represent them.
wint_t is not required to be able to represent such values. For example, if no extended character set of any supported locale uses character codes greater than 32767, then an implementation would be free to implement
wchar_t as an unsigned 16-bit integer, and
wint_t as a signed 16-bit integer.
With respect to the character and wide-character classification functions, the only answer is that the differences simply arise from different specifications. The
char classification functions are defined to work with the same values that
getchar() is defined to return -- either -1 or a character value converted, if necessary, to
unsigned char. The wide character classification functions, on the other hand, accept arguments of type
wint_t, which can represent the values of all wide-character unchanged, therefore there is no need for a conversion.
You claim in this regard that
We need to use
iswlower((unsigned wchar_t)wc)here, but there is no
No and maybe. You do not need to convert the
wchar_t argument to
iswlower() to any other type, and in particular, you do not need to convert it to an explicitly unsigned type. The wide character classification functions are not analogous to the regular character classification functions in this respect, having been designed with the benefit of hindsight. As for
unsigned wchar_t, C does not require such a type to exist, so portable code should not use it, but it may exist in some implementations.