nowox nowox - 2 months ago 27
Python Question

How to dump YAML with explicit references?

Recursive references work great in

ruamel.yaml
or
pyyaml
:

$ ruamel.yaml.dump(ruamel.yaml.load('&A [ *A ]'))
'&id001
- *id001'


However it (obviously) does not work on normal references:

$ ruamel.yaml.dump(ruamel.yaml.load("foo: &foo { a: 42 }\nbar: { <<: *foo }"))
bar: {a: 42}
foo: {a: 42}


I would like is to explicitly create a reference:

data = {}
data['foo'] = {'foo': {'a': 42}}
data['bar'] = { '<<': data['foo'], 'b': 43 }

$ ruamel.yaml.dump(data, magic=True)
foo: &foo
a: 42
bar:
<<: *foo
b: 43


This will be very useful to generate YAML output of large data structures that have lots of common keys

How is it possible without disputable re.replace on the output?

Actually the result of
ruamel.yaml.dump(data)
is

bar:
'<<': &id001
foo:
a: 42
b: 43
foo: *id001


So I need to replace
'<<'
with
<<
and maybe replace
id001
with
foo
.

Answer

If you want to create something like that, at least in ruamel.yaml ¹, you should use round-trip mode, which also preserves the merges. The following doesn't throw an assertion error:

import ruamel.yaml

yaml_str = """\
foo: &xyz
  a: 42
bar:
  <<: *xyz
"""

data = ruamel.yaml.round_trip_load(yaml_str)
assert ruamel.yaml.round_trip_dump(data) == yaml_str

What this means is that data has enough information to recreate the merge as it was in the output. In practise however, in round-trip mode, the merge never takes place. Instead retrieving a value data['foo']['bar']['a'] means that there is no real key 'bar' in data['foo'], but that that key is subsequently looked up in the attached "merge mappings".

There is no public interface for this (so things might change), but by analyzing data and looking at ruamel.yaml.comments.CommentedMap() you can find that there is a merge_attrib (currently being the string _yaml_merge) and more useful that there is a method add_yaml_merge(). The latter takes a list of (int, CommentedMap()) tuples.

baz = ruamel.yaml.comments.CommentedMap()
baz['b'] = 196
baz.yaml_set_anchor('klm')
data.insert(1, 'baz', baz)

you need to insert the 'baz' key before the 'bar' key of data, otherwise the mapping will reverse. After insert the new structure in the merge for data['bar']:

data['bar'].add_yaml_merge([(0, baz)])
ruamel.yaml.round_trip_dump(data, sys.stdout)

which gives:

foo: &xyz
  a: 42
baz: &klm
  b: 196
bar:
  <<: [*xyz, *klm]

( if you like to see what add_yaml_merge does insert

print(getattr(data['bar'], ruamel.yaml.comments.merge_attrib))

before and after the call)

If you want to start from scratch completely you can do:

data = ruamel.yaml.comments.CommentedMap([
    ('foo', ruamel.yaml.comments.CommentedMap([('a', 42)])),
    ])
data['foo'].yaml_set_anchor('xyz')
data['bar'] = bar = ruamel.yaml.comments.CommentedMap()
bar.add_yaml_merge([(0, data['foo'])])

instead of the data = ruamel.yaml.round_trip_load(yaml_str).


¹ Disclaimer: I am the author of that package.

Comments