Anatoly Makarevich Anatoly Makarevich - 2 months ago 38
YAML Question

YAML - Serializing attributes which are types

I am having trouble YAML-serializing classes which have type references as members. I am using the safe loader of ruamel.yaml.

I ran all the following from a REPL prompt (to get multiple errors).

Initialization:

import sys
from ruamel.yaml import YAML, yaml_object

Y = YAML(typ="safe",pure=True)

# ==============

@yaml_object(Y)
class A(object):
"""Object I want to serialize"""
yaml_tag = "!Aclass"
def __init__(self, type):
self.type = type
def f(self):
return self.type()
pass

class T1(object):
"""This will be referenced."""
pass

@yaml_object(Y)
class T2(object):
"""Another referenced object"""
pass

class T3(object):
"""Yet another try"""
pass
Y.register_class(T3.__class__)


Code that causes a failure:

Y.dump(A(T1), sys.stdout)
Y.dump(A(T2), sys.stdout)
Y.dump(A(T3), sys.stdout)
Y.dump(A(int), sys.stdout)


This outputs (only last lines of tracebacks):

ruamel.yaml.representer.RepresenterError: cannot represent an object: <attribute '__dict__' of 'T1' objects>
ruamel.yaml.representer.RepresenterError: cannot represent an object: <attribute '__dict__' of 'T2' objects>
ruamel.yaml.representer.RepresenterError: cannot represent an object: <attribute '__dict__' of 'T3' objects>
ruamel.yaml.representer.RepresenterError: cannot represent an object: <slot wrapper '__abs__' of 'int' objects>


Any solution that lets me (safely) uniquely save the type (I need to generate objects of the type AND check whether an incoming object is of a certain type) would be appreciated. A function or class that generates my required type would have the same problem of not being serializable, either.




P.S. I also possibly found a bug, where the parser will, for some reason, have different behavior depending on whether the same effective argument was (attempted) to be serialized.

Y.dump(A(str), sys.stdout)
Y.dump(A(str), sys.stdout)
Y.dump(A(str), sys.stdout)
Y.dump(A(str), sys.stdout)


Outputs:

>>> Y.dump(A(str), sys.stdout)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Program Files\Anaconda3\lib\site-packages\ruamel\yaml\main.py", line 352, in dump
return self.dump_all([data], stream, _kw, transform=transform)
File "C:\Program Files\Anaconda3\lib\site-packages\ruamel\yaml\main.py", line 383, in dump_all
self.representer.represent(data)
File "C:\Program Files\Anaconda3\lib\site-packages\ruamel\yaml\representer.py", line 73, in represent
node = self.represent_data(data)
File "C:\Program Files\Anaconda3\lib\site-packages\ruamel\yaml\representer.py", line 101, in represent_data
node = self.yaml_representers[data_types[0]](self, data)
File "C:\Program Files\Anaconda3\lib\site-packages\ruamel\yaml\main.py", line 552, in t_y
tag, data, cls, flow_style=representer.default_flow_style)
File "C:\Program Files\Anaconda3\lib\site-packages\ruamel\yaml\representer.py", line 371, in represent_yaml_object
return self.represent_mapping(tag, state, flow_style=flow_style)
File "C:\Program Files\Anaconda3\lib\site-packages\ruamel\yaml\representer.py", line 206, in represent_mapping
node_value = self.represent_data(item_value)
File "C:\Program Files\Anaconda3\lib\site-packages\ruamel\yaml\representer.py", line 101, in represent_data
node = self.yaml_representers[data_types[0]](self, data)
File "C:\Program Files\Anaconda3\lib\site-packages\ruamel\yaml\main.py", line 492, in t_y
tag, data, cls, flow_style=representer.default_flow_style)
File "C:\Program Files\Anaconda3\lib\site-packages\ruamel\yaml\representer.py", line 371, in represent_yaml_object
return self.represent_mapping(tag, state, flow_style=flow_style)
File "C:\Program Files\Anaconda3\lib\site-packages\ruamel\yaml\representer.py", line 206, in represent_mapping
node_value = self.represent_data(item_value)
File "C:\Program Files\Anaconda3\lib\site-packages\ruamel\yaml\representer.py", line 111, in represent_data
node = self.yaml_representers[None](self, data)
File "C:\Program Files\Anaconda3\lib\site-packages\ruamel\yaml\representer.py", line 375, in represent_undefined
raise RepresenterError("cannot represent an object: %s" % data)
ruamel.yaml.representer.RepresenterError: cannot represent an object: <slot wrapper '__add__' of 'str' objects>
>>> Y.dump(A(str), sys.stdout)
!Aclass
type: !type {}
>>> Y.dump(A(str), sys.stdout)
Traceback (most recent call last):
# same traceback here
ruamel.yaml.representer.RepresenterError: cannot represent an object: <slot wrapper '__add__' of 'str' objects>
>>> Y.dump(A(str), sys.stdout)
!Aclass
type: !type {}
>>>

Answer Source

YAML expects to dump objects, and eventually does so by writing out scalar strings. T1 is not an object (nor is T2 or T3), and that is where the problem comes from. You can try to make each class reference into an object and uses tags on those, but that IMO merely complicates things.

Eventually it all boils down to getting a scalar representation, i.e. a string representation of the class into the file, so you might as well adapt A() to directly dump a string representation and read it back:

import sys
from ruamel.yaml import YAML, yaml_object
from ruamel.yaml.compat import StringIO
from ruamel.yaml.scalarstring import DoubleQuotedScalarString


Y = YAML(typ="safe", pure=True)

# ==============

@yaml_object(Y)
class A(object):
    """Object I want to serialize"""
    yaml_tag = "!Aclass"
    def __init__(self, type):
        self.type = type  #.__class__.__name__

    @classmethod
    def to_yaml(cls, representer, node):
        return representer.represent_scalar(
            cls.yaml_tag, u'{}'.format(node.type.__name__)
        )

    @classmethod
    def from_yaml(cls, constructor, node):
        if '.' in node.value:  # in some other module
            m, n = node.value.rsplit('.', 1)
            return cls(getattr(sys.modules[m], n))
        else:
            return cls(globals()[node.value])


class T1(object):
    """This will be referenced."""
    pass


@yaml_object(Y)
class T2(object):
    """Another referenced object"""
    pass


class T3(object):
    """Yet another try"""
    pass
Y.register_class(T3)


for t in T1, T2, T3, DoubleQuotedScalarString:
    print('----------------------')
    x = StringIO()
    s = A(t)
    print('s', s.type)
    Y.dump(s, x)
    print(x.getvalue())

    d = Y.load(x.getvalue())
    print('d', d.type)

which gives:

----------------------
s <class '__main__.T1'>
!Aclass T1
...

d <class '__main__.T1'>
----------------------
s <class '__main__.T2'>
!Aclass T2
...

d <class '__main__.T2'>
----------------------
s <class '__main__.T3'>
!Aclass T3
...

d <class '__main__.T3'>
----------------------
s <class 'ruamel.yaml.scalarstring.DoubleQuotedScalarString'>
!Aclass DoubleQuotedScalarString
...

d <class 'ruamel.yaml.scalarstring.DoubleQuotedScalarString'>

If there are other attributes on A() that needs to be dumped/loaded, you should create a dictionary (with the string converted .type) and dump/load that.

I don't think you found a real bug, but that you experience a side effect from continuing after an error: the Y object (and its components) are left in an undefined state. You should not reuse a YAML() instance after catching errors. That should be more clear in the documentation. So if you want to do a try/except in the for loop , you should move the Y = YAML(typ='safe', pure=True) within the try part.