Justin Moore Justin Moore - 2 months ago 10
Java Question

Problems Converting Between ByteBuffer and String in Java

I'm currently developing an application where users can edit a ByteBuffer via a hex editor interface and also edit the corresponding text through a JTextPane. My current issue is because the JTextPane requires a String I need to convert the ByteBuffer to a String before displaying the value. However, during the conversion invalid characters are replaced by the charsets default replacement character. This squashes the invalid value so when I convert it back to a byte buffer the invalid characters value is replace by the byte value of the default replacement character. Is there an easy way to retain the byte value of an invalid character in a string? I've read the following stackoverflow posts but usually folks want to just replace unprintable characters, I need to preserve them.

Java ByteBuffer to String

Java: Converting String to and from ByteBuffer and associated problems

Is there an easy way to do this or do I need to keep track of all the changes that happen in the text editor and apply them to the ByteBuffer?

Here is code demonstrating the problem. The code uses byte[] instead of ByteBuffer but the issue is the same.

byte[] temp = new byte[16];
// 0x99 isn't a valid UTF-8 Character
Arrays.fill(temp,(byte)0x99);

System.out.println(Arrays.toString(temp));
// Prints [-103, -103, -103, -103, -103, -103, -103, -103, -103, -103, -103, -103, -103, -103, -103, -103]
// -103 == 0x99

System.out.println(new String(temp));
// Prints ����������������
// � is the default char replacement string

// This takes the byte[], converts it to a string, converts it back to a byte[]
System.out.println(Arrays.toString(new String(temp).getBytes()));
// I need this to print [-103, -103, -103, -103, -103, -103, -103, -103, -103, -103, -103, -103, -103, -103, -103, -103]
// However, it prints
//[-17, -65, -67, -17, -65, -67, -17, -65, -67, -17, -65, -67, -17, -65, -67, -17, -65, -67, -17, -65, -67, -17, -65, -67, -17, -65, -67, -17, -65, -67, -17, -65, -67, -17, -65, -67, -17, -65, -67, -17, -65, -67, -17, -65, -67, -17, -65, -67]
// The printed byte is the byte representation of �

Answer

Especially UTF-8 will go wrong

    byte[] bytes = {'a', (byte) 0xfd, 'b', (byte) 0xe5, 'c'};
    String s = new String(bytes, StandardCharsets.UTF_8);
    System.out.println("s: " + s);

One need a CharsetDecoder. There one can ignore (=delete) or replace the offending bytes, or by default: let an exception be thrown.

For the JTextPane we use HTML, so we can write the hex code of the offending byte in a <span> giving it a red background.

    ByteBuffer byteBuffer = ByteBuffer.wrap(bytes);
    CharsetDecoder decoder = StandardCharsets.UTF_8.newDecoder();
    CharBuffer charBuffer = CharBuffer.allocate(bytes.length * 50);
    charBuffer.append("<html>");
    for (;;) {
        try {
            CoderResult result = decoder.decode(byteBuffer, charBuffer, false);
            if (!result.isError()) {
                break;
            }
        } catch (RuntimeException ex) {
        }
        int b = 0xFF & byteBuffer.get();
        charBuffer.append(String.format(
            "<span style='background-color:red; font-weight:bold'> %02X </span>",
            b));
        decoder.reset();
    }
    charBuffer.rewind();
    String t = charBuffer.toString();
    System.out.println("t: " + t);

The code does not reflect a very nice API, but play with it.