Sergio Gliesh Sergio Gliesh - 4 months ago 11
Java Question

Why does read() read one byte at a time if char is 2 bytes?

If we have a character in our text file which is in unicode, mustn't it be 2 bytes of data?
But the

read()
method reads one byte at a time as an
int
. So if we have a
FileInputStream
object
fin
and we invoke
int x = fin.read()
once, how do we get the full character back upon
System.out.println(x)
if only one byte has been read? (the
fin.read()
is not in a
while
loop or anything, it is just called once)

Answer

Good question! You're right that in Java characters are always two bytes, but that isn't true elsewhere (e.g. in the contents of a file).

A file is not encoded "in "Unicode" because Unicode is a specification, not an encoding. Encodings map the Unicode specification to certain byte sequences, and not all such encodings use two-byte characters. Java chars are UTF-16 which is always two bytes wide, but many files are stored as UTF-8 which is variable-width; ASCII chars are one byte, others are two or more.

More to the point however, InputStream is designed to read binary data, not characters, and binary data is (essentially) always read one byte at a time. If you want to read text you wrap your stream in a a Reader (preferably explicitly specifying the encoding to be used) to convert the binary data into text. Internally it will call read() one or more times in order to properly construct a character from the sequence of bytes based on the encoding.

Comments