Jus12 Jus12 - 2 months ago 11
Java Question

Byte array to String and back.. issues with -127

In the following:

scala> (new String(Array[Byte](1, 2, 3, -1, -2, -127))).getBytes
res12: Array[Byte] = Array(1, 2, 3, -1, -2, 63)


why is -127 converted to 63? and how do I get it back as -127

[EDIT:] Java version below (to show that its not just a "Scala problem")

c:\tmp>type Main.java
public class Main {
public static void main(String [] args) {
byte [] b = {1, 2, 3, -1, -2, -127};
byte [] c = new String(b).getBytes();
for (int i = 0; i < 6; i++){
System.out.println("b:"+b[i]+"; c:"+c[i]);
}
}
}
c:\tmp>javac Main.java
c:\tmp>java Main
b:1; c:1
b:2; c:2
b:3; c:3
b:-1; c:-1
b:-2; c:-2
b:-127; c:63

Answer

The constructor you're calling makes it non-obvious that binary-to-string conversions use a decoding: String(byte[] bytes, Charset charset). What you want is to use no decoding at all.

Fortunately, there's a constructor for that: String(char[] value).

Now you have the data in a string, but you want it back exactly as is. But guess what! getBytes(Charset charset) That's right, there's an encoding applied automatically also. Fortunately, there is a toCharArray() method.

If you must start with bytes and end with bytes, you then have to map the char arrays to bytes:

(new String(Array[Byte](1,2,3,-1,-2,-127).map(_.toChar))).toCharArray.map(_.toByte)

So, to summarize: converting between String and Array[Byte] involves encoding and decoding. If you want to put binary data in a string, you have to do it at the level of characters. Note, however, that this will give you a garbage string (i.e. the result will not be well-formed UTF-16, as String is expected to be), and so you'd better read it out as characters and convert it back to bytes.

You could shift the bytes up by, say, adding 512; then you'd get a bunch of valid single Char code points. But this is using 16 bits to represent every 8, a 50% encoding efficiency. Base64 is a better option for serializing binary data (8 bits to represent 6, 75% efficient).