Sergio Gliesh Sergio Gliesh - 1 year ago 52
Java Question

Why does read() read one byte at a time if char is 2 bytes?

If we have a character in our text file which is in unicode, mustn't it be 2 bytes of data?
But the

method reads one byte at a time as an
. So if we have a
and we invoke
int x =
once, how do we get the full character back upon
if only one byte has been read? (the
is not in a
loop or anything, it is just called once)

Answer Source

Good question! You're right that in Java characters are always two bytes, but that isn't true elsewhere (e.g. in the contents of a file).

A file is not encoded "in "Unicode" because Unicode is a specification, not an encoding. Encodings map the Unicode specification to certain byte sequences, and not all such encodings use two-byte characters. Java chars are UTF-16 which is always two bytes wide, but many files are stored as UTF-8 which is variable-width; ASCII chars are one byte, others are two or more.

More to the point however, InputStream is designed to read binary data, not characters, and binary data is (essentially) always read one byte at a time. If you want to read text you wrap your stream in a a Reader (preferably explicitly specifying the encoding to be used) to convert the binary data into text. Internally it will call read() one or more times in order to properly construct a character from the sequence of bytes based on the encoding.