yaml/ruamel load and dumped out file is missing variables of same value

133 views Asked by At
  1. I have a string and using ruamel to load the string and dump the file output in yaml format.
  2. The string contains arrays of same value.
  3. If its of same value it misses those value but if there is different values then it prints those values.

Code:

import sys
import json
import ruamel.yaml
import re

dit="{p_d:  {p: a0, nb: 0, be: {ar: {1, 1, 1, 1}}, bb: {tt: {dt: {10, 10}, vl: {0}, rl: {0}, sf: {10, 20}, ef: {10, 20}}}}}"

yaml_str=dit
print(yaml_str)

dict_yaml_str = yaml_str.split('\n')

print('#### full block style')
yaml = ruamel.yaml.YAML(typ='safe') # 
yaml.default_flow_style = False
yaml.allow_duplicate_keys = True

data = ""
fileo = open("yamloutput.yaml", "w")
for dys in dict_yaml_str:
    data = yaml.load(dys)
    print("data: {}".format(data))

    yaml.dump(data, fileo)
fileo.close()

Output:

p_d:
  bb:
    tt:
      dt:
        10: null
      ef:
        10: null
        20: null
      rl:
        0: null
      sf:
        10: null
        20: null
      vl:
        0: null
  be:
    ar:
      1: null
  nb: 0
  p: a0

Expected Output:

p_d:
  bb:
    tt:
      dt:
        10: null
        10: null
      ef:
        10: null
        20: null
      rl:
        0: null
      sf:
        10: null
        20: null
      vl:
        0: null
  be:
    ar:
      1: null
      1: null
      1: null
      1: null
  nb: 0
  p: a0

Is it some config know from yaml that I am missing ? Please share in your inputs.

2

There are 2 answers

0
Anthon On

It generally helps to find a problem if you minimize the code that reproduces it (i.e. not import json and re, not split a string on newlines that doesn't have a newline, minimize input):

You should never have to use yaml.allow_duplicate_keys, as it is only to allow to mimic faulty behaviour by PyYAML. The fault lies in the fact that YAML doesn't allow duplicate keys and PyYAML does collate these silently, selecting some value (IIRC the last, unless the merge key is involved).

If you leave out the allow_duplicate_keys, and minimize your input to reproduce the DuplicateKeyError, you'll see that the offending key is 1:

ruamel.yaml.constructor.DuplicateKeyError: while constructing a mapping
  in "<unicode string>", line 1, column 1
found duplicate key "1" with value "None" (original value: "None")
  in "<unicode string>", line 1, column 5

and that the value associated with 1 is None. That is because {1, 1, 1, 1} is loaded as if you write {1: null, 1: null, 1: null, 1: null}

If you use .allow_duplicate_keys, you don't get a Python dict with duplicate keys (what you seem to assume), you get a normal Python dict (which doens't allow for duplicate keys) with the value for that duplicate key set to the first value encountered:

import sys
import ruamel.yaml

yaml_str = """\
{1, 1, 1, 1}
"""
    
yaml = ruamel.yaml.YAML(typ='safe')
yaml.default_flow_style = False
yaml.allow_duplicate_keys = 1
data = yaml.load(yaml_str)
print(data)

which gives:

{1: None}

So your expectations that the above dumps as a dictionary with multiple items is incorrect.

2
blhsing On

As @Anthon pointed out, the YAML format does not actually allow mappings with duplicate keys, so you should consider revising your input data with a different structure.

However, if you still want to process the input data as-is, ruamel.yaml has made it fairly easy to customize its behaviors by using custom constructor and representer classes.

But first, we need a custom mapping class that supports duplicate keys. We can do that with a dict subclass that stores values in sub-lists instead. For simplicity and demonstration purposes only the __setitem__ and items methods are implemented here to make the constructor and representer work:

class DuplicateKeyMapping(dict):
    def __setitem__(self, key, value):
        self.setdefault(key, []).append(value)

    def items(self):
        for key, lst in super().items():
            for value in lst:
                yield key, value

Now, we need a constructor that converts mappings into objects of our custom mapping class instead of regular dicts. This can be done by subclassing ruamel.yaml.SafeConstructor and overriding the construct_mapping method, which we then register for the tag:yaml.org,2002:map tag:

from ruamel.yaml import YAML, SafeConstructor, SafeRepresenter

class Constructor(SafeConstructor):
    def construct_mapping(self, node, deep=False):
        mapping = DuplicateKeyMapping()
        for key_node, value_node in node.value:
            key = self.construct_object(key_node, deep)
            value = self.construct_object(value_node, deep)
            mapping[key] = value
        return mapping

Constructor.add_constructor('tag:yaml.org,2002:map', Constructor.construct_mapping)

And to allow our custom mapping class to be represented properly, we add a representer method for it in a ruamel.yaml.SafeRepresenter subclass, and register the representer method for the custom mapping class:

class Representer(SafeRepresenter):
    def represent_DuplicateKeyMapping(self, data):
        return self.represent_mapping('tag:yaml.org,2002:map', data)

Representer.add_representer(DuplicateKeyMapping, Representer.represent_DuplicateKeyMapping)

Finally, we instantiate a YAML instance and point its constructor and representer to our custom ones:

yaml = YAML(typ='safe')
yaml.allow_duplicate_keys = True
yaml.default_flow_style = False
yaml.Constructor = Constructor
yaml.Representer = Representer

so that:

import sys

s = "{p_d:  {p: a0, nb: 0, be: {ar: {1, 1, 1, 1}}, bb: {tt: {dt: {10, 10}, vl: {0}, rl: {0}, sf: {10, 20}, ef: {10, 20}}}}}"
o = yaml.load(s)
yaml.dump(o, sys.stdout)

outputs:

p_d:
  bb:
    tt:
      dt:
        10: null
        10: null
      ef:
        10: null
        20: null
      rl:
        0: null
      sf:
        10: null
        20: null
      vl:
        0: null
  be:
    ar:
      1: null
      1: null
      1: null
      1: null
  nb: 0
  p: a0

Demo: https://replit.com/@blhsing1/DoubleSuperficialServer