Parse a YAML with duplicate anchors in Python

Gee.E

I'm just getting started with both YAML and Python and I'm trying to parse a YAML in Python which contains anchors and aliases.
In this YAML I overwrite the anchors to make certain nodes have different values.

An example of my YAML:

Some Colors: &some_colors
 color_primary: &color_primary "#112233FF"
 color_secondary: &color_secondary "#445566FF"

Element: &element
 color: *color_primary

Overwrite some colors: &overwrite_colors
 color_primary: &color_primary "#000000FF"

Another element: &another_element
 color: *color_primary

Which has the expected outcome of (in JSON):

{
    "Some Colors": {
        "color_primary": "#112233FF",
        "color_secondary": "#445566FF"
    },
    "Element": {
        "color": "#112233FF"
    },
    "Overwrite some colors": {
        "color_primary": "#000000FF"
    },
    "Another element": {
        "color": "#000000FF"
    }
}

I tested the above YAML snippet here

From what I've read in the YAML docs; this should've been possible from version 1.1 (I think), but at least YAML version 1.2 should support it.

But whenever I try to parse the YAML, using PyYAML (with yaml.load()) or the ruamel,yaml package (with ruamel.yaml.load()), I get the 'duplicate anchor' error.

What am I doing wrong here? And how to fix this?

EDIT:

With the help of ruamel's owner I've found a solution to the above question.

As of ruamel v0.12.3 the above works as expected, although you will receive ReusedAnchorWarnings.
These warnings can be suppressed with the following snippet:

import warnings
from ruamel.yaml.error import ReusedAnchorWarning

warnings.simplefilter("ignore", ReusedAnchorWarning)

Giving credits where this is due; all of them go to ruamel's owner.


As an added question; when I modify the above YAML to (notice the change at // <-- Added this):

Some Colors: &some_colors
 color_primary: &color_primary "#112233FF"
 color_secondary: &color_secondary "#445566FF"

Element: &element
 color: *color_primary

Overwrite some colors: &overwrite_colors
 <<: *some_colors   // <-- Added this to include 'color_secondary' as well
 color_primary: &color_primary "#000000FF"

Another element: &another_element
 color: *color_primary

The output is:

{
    "Some Colors": {
        "color_primary": "#000000FF",
        "color_secondary": "#445566FF"
    },
    "Element": {
        "color": "#112233FF"
    },
    "Overwrite some colors": {
        "color_primary": "#000000FF",
        "color_secondary": "#445566FF"
    },
    "Another element": {
        "color": "#445566FF" // <-- Now the value is 'color_secondary' instead of 'color_primary'?
    }
}

Why is the color of Another element looking at the value of color_secondary instead?

Is there any way to fix this as well?

Anthon

First of all, you are not doing anything wrong. PyYAML is doing something wrong here. This is most likely because dumping anchors with the same name would be an erroneous situation the the PyYAML dumper. If you have a Python structure that is self referential:

 a = dict(x=1)
 a['y'] = a

then PyYAML (and ruamel.yaml) will create you a unique anchor name to. If this name was not unique it would depend on where the name was used as an alias. It therefore makes sense to be suspicious of any reused anchor names, as this might point to a bug in the YAML serialisation code, but it is not against the specification (reuse is already ok according to YAML 1.0 spec (section 3.2.2.2)).

A bug report for the python-yaml Debian module exists since 2009, but I haven't found out if that ended up-stream.

As you indicated this is solved in ruamel.yaml 0.12.3


To answer your second question, that is just because the "Best Online YAML Converter" isn't, and parses this wrong. It even throws an error if there is a YAML comment on the merge line:

 <<: *some_colors   # <-- Added this to include 'color_secondary' as well

This parses as expected in ruamel.yaml (0.12.3):

import sys
import ruamel.yaml
import warnings
from ruamel.yaml.error import ReusedAnchorWarning
warnings.simplefilter("ignore", ReusedAnchorWarning)

yaml_str = """\
Some Colors: &some_colors
 color_primary: &color_primary "#112233FF"
 color_secondary: &color_secondary "#445566FF"

Element: &element
 color: *color_primary

Overwrite some colors: &overwrite_colors
 <<: *some_colors   # <-- Added this to include 'color_secondary' as well
 color_primary: &color_primary "#000000FF"

Another element: &another_element
 color: *color_primary
"""


data = ruamel.yaml.safe_load(yaml_str)
ruamel.yaml.round_trip_dump(data, sys.stdout)

gives:

Some Colors:
  color_primary: '#112233FF'
  color_secondary: '#445566FF'
Overwrite some colors:
  color_primary: '#000000FF'
  color_secondary: '#445566FF'
Another element:
  color: '#000000FF'    # <- not #445566FF
Element:
  color: '#112233FF'

(comment added by hand)

If you want to use the new API, and the safe loader, make sure to specify pure=True, as otherwise ruamel.yaml's copy of libyaml (which still has this bug) will be used and you'll get the ComposerError:

yaml = ruamel.yaml.YAML(typ='safe')
data = yaml.load(yaml_str)
yaml.dump(data, sys.stdout)

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related