Python Yaml parse inf as float

Erotemic

In PyYaml or ruamel.yaml I'm wondering if there is a way to handle parsing of specific strings. Specifically, I'd like to be able to parse "[inf, nan]" as [float('inf'), float('nan')]. I'll also note that I would like "['inf', 'nan']" to continue to parse as ['inf', 'nan'], so it's just the unquoted variant that I'd like to intercept and change the current behavior.

I'm aware that currently I could use "[.inf, .nan]" or "[!!float inf, !!float nan]", but I'm curious if I could extend the Loader to allow for the syntax that I expected would have worked (but doesn't).

Perhaps I'm just making a footgun by allowing "nan" and "inf" to be parsed as floats rather than strings - and I'm interested in hearing compelling reasons that I should not allow for this type of parsing. But I'm not too woried about the case where other parses would parse my configs incorrectly (but maybe I'm underestimating the pain that will cause in the future). I plan to use this as a one way convineince in parsing arguments on the command line, and I don't expect actual config files to be written like this.

In any case I'd still be interested in how it could be done, even if the conclusion is that it shouldn't be done.

Anthon

Based on the confusion that I have seen caused by Yes, On, No and Off being interpreted as boolean values in YAML 1.1, I don't think this is a good idea.

But it is possible to do this both in ruamel.yaml and PyYAML, by changing the regex that recognises floats (i.e. that assigns the implicit tag tag:yaml.org,2002:float to the scalar) and then to make sure the routine constructing a float from a scalar handles these additional scalars. The three main improvements (with regard to this) in ruamel.yaml are that it has different regexes for YAML 1.1 and YAML 1.2 parsing (the latter being the default, the former having to be specified either by a directive, or by setting .version on the YAML() instance); that the various Resolvers each have a copy of these regexes instead of sharing one (as in PyYAML, which makes having multiple, differently behaving parsers in one program difficult); and that regex compilation is delayed until they are actually needed.

Given the differences, the following will only apply to ruamel.yaml

You need to create a resolver, and replace its regex recognition for all floats, and then create a constructor that constructs the floats based on the recognised scalars:

import re, sys
import ruamel.yaml

class NanInfResolver(ruamel.yaml.resolver.VersionedResolver):
    pass

# difference with the regex in resolver.py is the ? after \\.
# as well as recognising N and I as starting chars
# no delayed compile of the regex here
NanInfResolver.add_implicit_resolver(
    'tag:yaml.org,2002:float',
    re.compile('''^(?:
     [-+]?(?:[0-9][0-9_]*)\\.[0-9_]*(?:[eE][-+]?[0-9]+)?
    |[-+]?(?:[0-9][0-9_]*)(?:[eE][-+]?[0-9]+)
    |[-+]?\\.[0-9_]+(?:[eE][-+][0-9]+)?
    |[-+]?\\.?(?:inf|Inf|INF)       
    |\\.?(?:nan|NaN|NAN))$''', re.X),
    list('-+0123456789.niNI')
)

class NanInfConstructor(ruamel.yaml.constructor.RoundTripConstructor):
    def construct_yaml_float(self, node):
        value = self.construct_scalar(node).lower()
        sign = +1
        if value[0] == '-':
            sign = -1
        if value[0] in '+-':
            value_s = value_s[1:]
        if value == 'inf':
            return sign * self.inf_value
        if value == 'nan':
            return self.nan_value
        return super().construct_yaml_float(node)

NanInfConstructor.add_constructor(
    'tag:yaml.org,2002:float', NanInfConstructor.construct_yaml_float
)



yaml_str = """\
[nano, 1.0, .NaN, inf, nan]  # some extra values to test
"""
    
yaml = ruamel.yaml.YAML()
yaml.Resolver = NanInfResolver
yaml.Constructor = NanInfConstructor

data = yaml.load(yaml_str)
for x in data:
    print(type(x), x)
print()
yaml.dump(data, sys.stdout)

which gives:

<class 'str'> nano
<class 'ruamel.yaml.scalarfloat.ScalarFloat'> 1.0
<class 'float'> nan
<class 'float'> inf
<class 'float'> nan

[nano, 1.0, .nan, .inf, .nan] # some extra values to test

That 1.0 is loaded as a ScalarFloat is necessary to preserve its formatting when dumping. It is possible to preserve the different ways of writing .nan, .inf, nan and inf in a similar way, but you would have to make a special representer and either extend ScalarFloat or make one or more explicit types that keep the the original scalar string value. Either way you would lose the possibility to test with x is float('nan') which may be a problem in real programs (which is also the reason why ruamel.yaml doesn't preserve the different forms of null during round-trip).

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related