onigunn onigunn - 1 year ago 198
Java Question

Reading UTF-8 - BOM marker

I'm reading a file through a FileReader - the file is UTF-8 decoded (with BOM) now my problem is: I read the file and output a string, but sadly the BOM marker is outputted too. Why this occurs?

fr = new FileReader(file);
br = new BufferedReader(fr);
String tmp = null;
while ((tmp = br.readLine()) != null) {
String text;
text = new String(tmp.getBytes(), "UTF-8");
content += text + System.getProperty("line.separator");

output after first line


Answer Source

In Java, you have to consume manually the UTF8 BOM if present. This behaviour is documented in the Java bug database, here and here. There will be no fix for now because it will break existing tools like JavaDoc or XML parsers. The Apache IO Commons provides a BOMInputStream to handle this situation.

Take a look at this solution: Handle UTF8 file with BOM

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download