spirosbax spirosbax - 1 year ago 82
C Question

Example in Chapter 1, 1.5.1, of The C Programming Language Second edittion K&R,

int c;

while ((c = getchar()) != EOF)

"This value is called EOF, for "end of file". We must declare c to be
a type big enough to hold EOF in addition to any possible char.
Therefore we use int."

Correct me if i am wrong:

  • (signed) char = [-128. +127]

  • unsigned char = [0, 255]

  • EOF = -1

when I replace
in the above program it seems to work like intended , but after some research I found out that it doesn't because the variable
cannot store -1 aka
( albeit using

I run it anyway and tried to crash it, I tried to input negative number like -1 but it didn't work. I believe that is because it is interpreted like 2 different characters
. I tried
which is the character corresponding to ascii value 255 according to http://ascii-code.com/,
so for what input will the above program ( using
instead of
) crash ?

(For information, I am using a 64bit fedora Linux)

Answer Source

It has been explained in other answers before, but sometimes it is harder to find the duplicate than to give the answer.

The plain char type can be signed or unsigned.

The function getchar() returns either EOF or …obtains that character as an unsigned char converted to an int (quoting the standard for fgetc(), but it applies to getchar() too).

If you have an unsigned plain char type, then the assignment will generate a value 0..255 which will then be promoted to int for the comparison with EOF, and since none of the values 0..255 is negative, the test will always fail — and the loop won't stop until you terminate the program by some other means (interrupt, reboot, …).

If you have a signed plain char type, then the assignment will treat both one valid character (often ÿ — U+00FF, LATIN SMALL LETTER Y WITH DIAERESIS, if you are using a single-byte code set such as ISO 8859-15) and EOF as marking EOF, so the loop may terminate prematurely on some files.

So, depending on the machine, the loop:

char c;

while ((c = getchar()) != EOF)

may either be an infinite loop or it may terminate before EOF for some data files. Neither is correct behaviour — and neither behaviour is a crash. (The code in the question won't crash.) Changing the type of c to int fixes both problems reliably and portably.

Note that if you are working with a UTF-8 locale, you will not generate the hex 0xFF byte; that is not a valid byte in UTF-8 (U+00FF is encoded as two bytes 0xC3 0xBF in UTF-8).