JohnW JohnW - 21 days ago 8
YAML Question

How can I hook a filter in PyYAML parser?

I have a YAML file that I want to parse.

For multiple reasons, I want to forbid the use of the dot

.
in anchors or just replace it with
_
at parsing time.

Simply, I want to go from this:

foo:
bar.baz:
- egg
- spam


to that:

foo:
bar_baz:
- egg
- spam


I am aware that this kind of transformation could be performed on the resulting Python dictionary but it is not the right place for it: either the parser should throw an error or it should replace the offending value.

I already tried to subclass
Loader
in order to do this kind of transformation, but none of the overridden functions seem to have any effect.

Answer

There is no easy mechanism to replace the key in the form of some hook through which each mapping key is passed (anyway, you might want to have some more context than just having the key). There are various approaches on how to solve this:

  • you could make a new Loader, that would have your own Constructor subclass that does the transformation on the mapping keys. This is IMO the right solution, in that it doesn't influence the loading of other YAML. It is however also one of the more tricky to get right
  • you can add a new constructor for mappings to the Loader you are using, thereby overriding the existing one. This influences all future loading of further YAML files if you don't do anything special.
  • you can wrap the existing mapping constructor, load your YAML and move the original back. This then doesn't influence loading of further YAML files.

The latter of these can be done with:

import sys
import ruamel.yaml

yaml_str = """\
foo:
    bar.baz:
        - egg
        - spam
"""


def alt_construct_mapping(self, *args, **kw):
    """replace keys with dot"""
    m = self.org_construct_mapping(*args, **kw)
    for k in m:
        if '.' in k:
            m[k.replace('.', '_')] = m.pop(k)
    return m

# backup up the constructor
ruamel.yaml.constructor.BaseConstructor.org_construct_mapping = \
    ruamel.yaml.constructor.BaseConstructor.construct_mapping

# replace the constructor
ruamel.yaml.constructor.BaseConstructor.construct_mapping = alt_construct_mapping


data = ruamel.yaml.safe_load(yaml_str)
ruamel.yaml.round_trip_dump(data, sys.stdout)

# put original constructor back
ruamel.yaml.constructor.BaseConstructor.construct_mapping = \
    ruamel.yaml.constructor.BaseConstructor.org_construct_mapping

which gives:

foo:
  bar_baz:
  - egg
  - spam

This was done using ruamel.yaml, an enhanced version of PyYAML, of which I am the author. For PyYAML this should work as well as long as your YAML doesn't have any YAML version 1.2 constructs, replace ruamel.yaml with yaml and the round_trip_load/dump with safe_load/dump