B. Clement B. Clement - 4 days ago 4
Apache Configuration Question

Apache Thrift : difference between byte and binary types

I would like to send 1024 bytes of data using Thrift. It must be exactly 1024 bytes because it is a comparative benchmark with other frameworks.

Thrift has two types to represent bytes : 'byte' and 'binary', but I don't know how to use these types.
'binary' type is mapped to std::string which is quite strange (I don't understand why and how to use it).
'byte' type is mapped to a 8 bits integer which seems more logical to me.

To represent 1024 bytes of data, I use :

list<byte> byteSequence
with a size of 1024.

But a compile warning advises me to use
binary
instead of
list<byte>
, but why ? and how ?

I think I will get much better performance with 'binary' because it is strangely slow with a 1024 sequence of bytes.

Thank you.

Answer

It probably depends on the language you will be compiling your thrift files to, but binary tells thrift directly that you indeed want to transmit a sequence of raw, unencoded bytes.

It may not change things much at the transport layer in terms of size, but you may run into surprises when you instantiate/de-serialise the objects in your chosen language. In Java, for example, a binary field will be represented with a byte[] whereas list[byte] will give you a List[Byte], which is far less efficient to represent the same thing.

Java might be the only reason for binary, as according to the thrift doc:

binary: a sequence of unencoded bytes

N.B.: This is currently a specialized form of the string type above, added to provide better interoperability with Java. The current plan-of-record is to elevate this to a base type at some point.

Comments