teodron teodron - 1 year ago 67
Python Question

Separate binary data (blobs) in csv files

Is there any safe way of mixing binary with text data in a (pseudo)csv file?

One naive and partial solution would be:

  • using a compound field separator, made of more than one character (e.g. the
    sequence for example)

  • saving each field as either text or as binary data would require the parser of the pseudocsv to look for the
    sequence and read the data between separators according to a known rule (e.g. by the means of a known header with field name and field type, for example)

The core issue is that binary data is not guaranteed to not contain the
sequence somewhere inside its body, before the actual end of the data.

The proper solution would be to save the individual blob fields in their own separate physical files and only include the filenames in a .csv, but this is not acceptable in this scenario.

Is there any proper and safe solution, either already implemented or applicable given these restrictions?

Answer Source

If you need everything in a single file, just use one of the methods to encode binary as printable ASCII, and add that results to the CSV vfieds (letting the CSV module add and escape quotes as needed).

One such method is base64 - but even on Python's base64 codec, there are more efficient codecs like base85 (on newer Pythons, version 3.4 and above, I guess).

So, an example in Python 2.7 would be:

import csv, base64

import random
data = b''.join(chr(random.randrange(0,256)) for i in range(50))

writer = csv.writer(open("testfile.csv", "wt"))
writer.writerow(["some text", base64.b64encode(data)])

Of course, you have to do the proper base64 decoding on reading the file as well - but it is certainly better than trying to create an ad-hoc escaping method.