MadcowD MadcowD - 1 year ago 75
JSON Question

Deserialize json/yaml from binary stream which contains other data

Suppose I have a binary stream

and I generate it as follows.

yaml.dump(some_obj, stream)

Then do I have to write a custom parser of some sort for the stream or can I recover
as follows.
recovered = yaml.load(stream)

If this doesn't work with yaml serialization, does it work with json serialization?

Answer Source

You cannot do what you want because the YAML parser consumes the complete stream even if you dummp an explicit end (yaml.dump(some_obj, stream, explicit_end=True) (which essentially insert ...\n before awesome) and it also doesn't work when writing ---\nawesome (the document separator). The YAML parser consumes the word awesome¹ both when you use yaml.load() as well as when you use yaml.load_all().

The part up front works fine, so you can consider doing something like:

import ruamel.yaml as yaml

file_name = 'test.comb'

some_obj = dict(a = [1, 2], b = {3: 42})

with open(file_name, 'w') as stream:
    yaml.dump(some_obj, stream, explicit_end=True)

with open(file_name) as stream:
    assert == 'lol'
    stream_data = ''
    while True:
        stream_data +=
        if stream_data[-4:] == '...\n':
    recovered = yaml.load(stream_data)
    assert == 'awesome'


which gives (in Python2):

{'a': [1, 2], 'b': {3: 42}}

and the file contents are:

lola: [1, 2]
b: {3: 42}

I use a similar technique, but reading lines with for line in stream, which cannot be combined with normal read() operations, for files that have a YAML header with metadata, followed by normal text (non-indented so emacs can properly work on it).

¹ I consider reading past the end-of-stream marker (...) a bug in the Python YAML parser so I'll try and fix this in the next release.