Gee.E Gee.E - 3 months ago 25
Python Question

Parse a YAML with duplicate anchors in Python

I'm just getting started with both YAML and Python and I'm trying to parse a YAML in Python which contains anchors and aliases.

In this YAML I overwrite the anchors to make certain nodes have different values.

An example of my YAML:

Some Colors: &some_colors
color_primary: &color_primary "#112233FF"
color_secondary: &color_secondary "#445566FF"

Element: &element
color: *color_primary

Overwrite some colors: &overwrite_colors
color_primary: &color_primary "#000000FF"

Another element: &another_element
color: *color_primary


Which has the expected outcome of (in JSON):

{
"Some Colors": {
"color_primary": "#112233FF",
"color_secondary": "#445566FF"
},
"Element": {
"color": "#112233FF"
},
"Overwrite some colors": {
"color_primary": "#000000FF"
},
"Another element": {
"color": "#000000FF"
}
}



I tested the above YAML snippet here


From what I've read in the YAML docs; this should've been possible from version 1.1 (I think), but at least YAML version 1.2 should support it.

But whenever I try to parse the YAML, using PyYAML (with
yaml.load()
) or the
ruamel,yaml
package (with
ruamel.yaml.load()
), I get the 'duplicate anchor' error.

What am I doing wrong here? And how to fix this?

EDIT:

With the help of
ruamel
's owner I've found a solution to the above question.

As of
ruamel
v0.12.3 the above works as expected, although you will receive
ReusedAnchorWarning
s.

These warnings can be suppressed with the following snippet:

import warnings
from ruamel.yaml.error import ReusedAnchorWarning

warnings.simplefilter("ignore", ReusedAnchorWarning)


Giving credits where this is due; all of them go to
ruamel
's owner.




As an added question; when I modify the above YAML to (notice the change at
// <-- Added this
)
:

Some Colors: &some_colors
color_primary: &color_primary "#112233FF"
color_secondary: &color_secondary "#445566FF"

Element: &element
color: *color_primary

Overwrite some colors: &overwrite_colors
<<: *some_colors // <-- Added this to include 'color_secondary' as well
color_primary: &color_primary "#000000FF"

Another element: &another_element
color: *color_primary


The output is:

{
"Some Colors": {
"color_primary": "#000000FF",
"color_secondary": "#445566FF"
},
"Element": {
"color": "#112233FF"
},
"Overwrite some colors": {
"color_primary": "#000000FF",
"color_secondary": "#445566FF"
},
"Another element": {
"color": "#445566FF" // <-- Now the value is 'color_secondary' instead of 'color_primary'?
}
}


Why is the
color
of
Another element
looking at the value of
color_secondary
instead?

Is there any way to fix this as well?

Answer

First of all, you are not doing anything wrong. PyYAML is doing something wrong here. This is most likely because dumping anchors with the same name would be an erroneous situation the the PyYAML dumper. If you have a Python structure that is self referential:

 a = dict(x=1)
 a['y'] = a

then PyYAML (and ruamel.yaml will create you a unique anchor name to. If this name was not unique it would be depending on where the name was used as an alias. It therefore makes sense to be suspicious of any reused anchor names, as this might point to a bug in the YAML serialisation code, but it is not against the specification (reuse is already ok according to YAML 1.0 spec (section 3.2.2.2)).

A bug report for the python-yaml Debian module existed since 2009, but I haven't found if that ended up-stream.

As you indicated this is solved in ruamel.yaml 0.12.3


Two answer your second question, that is just because the "Best Online YAML Converter" isn't, and parses this wrong. It even throws an error if there is a YAML comment on the merge line:

 <<: *some_colors   # <-- Added this to include 'color_secondary' as well

This parses as expected in ruamel.yaml (0.12.3):

import sys
import ruamel.yaml
import warnings
from ruamel.yaml.error import ReusedAnchorWarning
warnings.simplefilter("ignore", ReusedAnchorWarning)

yaml_str = """\
Some Colors: &some_colors
 color_primary: &color_primary "#112233FF"
 color_secondary: &color_secondary "#445566FF"

Element: &element
 color: *color_primary

Overwrite some colors: &overwrite_colors
 <<: *some_colors   # <-- Added this to include 'color_secondary' as well
 color_primary: &color_primary "#000000FF"

Another element: &another_element
 color: *color_primary
"""


data = ruamel.yaml.load(yaml_str)
ruamel.yaml.round_trip_dump(data, sys.stdout)

gives:

Some Colors:
  color_primary: '#112233FF'
  color_secondary: '#445566FF'
Overwrite some colors:
  color_primary: '#000000FF'
  color_secondary: '#445566FF'
Another element:
  color: '#000000FF'    # <- not #445566FF
Element:
  color: '#112233FF'

(comment added by hand)

Comments