Python Yaml parse inf as float

Erotemic Published at Dev

Erotemic

In PyYaml or ruamel.yaml I'm wondering if there is a way to handle parsing of specific strings. Specifically, I'd like to be able to parse "[inf, nan]" as [float('inf'), float('nan')]. I'll also note that I would like "['inf', 'nan']" to continue to parse as ['inf', 'nan'], so it's just the unquoted variant that I'd like to intercept and change the current behavior.

I'm aware that currently I could use "[.inf, .nan]" or "[!!float inf, !!float nan]", but I'm curious if I could extend the Loader to allow for the syntax that I expected would have worked (but doesn't).

Perhaps I'm just making a footgun by allowing "nan" and "inf" to be parsed as floats rather than strings - and I'm interested in hearing compelling reasons that I should not allow for this type of parsing. But I'm not too woried about the case where other parses would parse my configs incorrectly (but maybe I'm underestimating the pain that will cause in the future). I plan to use this as a one way convineince in parsing arguments on the command line, and I don't expect actual config files to be written like this.

In any case I'd still be interested in how it could be done, even if the conclusion is that it shouldn't be done.

Anthon

Based on the confusion that I have seen caused by Yes, On, No and Off being interpreted as boolean values in YAML 1.1, I don't think this is a good idea.

But it is possible to do this both in ruamel.yaml and PyYAML, by changing the regex that recognises floats (i.e. that assigns the implicit tag tag:yaml.org,2002:float to the scalar) and then to make sure the routine constructing a float from a scalar handles these additional scalars. The three main improvements (with regard to this) in ruamel.yaml are that it has different regexes for YAML 1.1 and YAML 1.2 parsing (the latter being the default, the former having to be specified either by a directive, or by setting .version on the YAML() instance); that the various Resolvers each have a copy of these regexes instead of sharing one (as in PyYAML, which makes having multiple, differently behaving parsers in one program difficult); and that regex compilation is delayed until they are actually needed.

Given the differences, the following will only apply to ruamel.yaml

You need to create a resolver, and replace its regex recognition for all floats, and then create a constructor that constructs the floats based on the recognised scalars:

import re, sys
import ruamel.yaml

class NanInfResolver(ruamel.yaml.resolver.VersionedResolver):
    pass

# difference with the regex in resolver.py is the ? after \\.
# as well as recognising N and I as starting chars
# no delayed compile of the regex here
NanInfResolver.add_implicit_resolver(
    'tag:yaml.org,2002:float',
    re.compile('''^(?:
     [-+]?(?:[0-9][0-9_]*)\\.[0-9_]*(?:[eE][-+]?[0-9]+)?
    |[-+]?(?:[0-9][0-9_]*)(?:[eE][-+]?[0-9]+)
    |[-+]?\\.[0-9_]+(?:[eE][-+][0-9]+)?
    |[-+]?\\.?(?:inf|Inf|INF)       
    |\\.?(?:nan|NaN|NAN))$''', re.X),
    list('-+0123456789.niNI')
)

class NanInfConstructor(ruamel.yaml.constructor.RoundTripConstructor):
    def construct_yaml_float(self, node):
        value = self.construct_scalar(node).lower()
        sign = +1
        if value[0] == '-':
            sign = -1
        if value[0] in '+-':
            value_s = value_s[1:]
        if value == 'inf':
            return sign * self.inf_value
        if value == 'nan':
            return self.nan_value
        return super().construct_yaml_float(node)

NanInfConstructor.add_constructor(
    'tag:yaml.org,2002:float', NanInfConstructor.construct_yaml_float
)



yaml_str = """\
[nano, 1.0, .NaN, inf, nan]  # some extra values to test
"""
    
yaml = ruamel.yaml.YAML()
yaml.Resolver = NanInfResolver
yaml.Constructor = NanInfConstructor

data = yaml.load(yaml_str)
for x in data:
    print(type(x), x)
print()
yaml.dump(data, sys.stdout)

which gives:

<class 'str'> nano
<class 'ruamel.yaml.scalarfloat.ScalarFloat'> 1.0
<class 'float'> nan
<class 'float'> inf
<class 'float'> nan

[nano, 1.0, .nan, .inf, .nan] # some extra values to test

That 1.0 is loaded as a ScalarFloat is necessary to preserve its formatting when dumping. It is possible to preserve the different ways of writing .nan, .inf, nan and inf in a similar way, but you would have to make a special representer and either extend ScalarFloat or make one or more explicit types that keep the the original scalar string value. Either way you would lose the possibility to test with x is float('nan') which may be a problem in real programs (which is also the reason why ruamel.yaml doesn't preserve the different forms of null during round-trip).

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at2023-09-12

Comments

0 comments

TOP Ranking

Article

Python Yaml parse inf as float

Python Yaml parse inf as float

pump.io port in URL

Loopback Error: connect ECONNREFUSED 127.0.0.1:3306 (MAMP)

Can't pre-populate phone number and message body in SMS link on iPhones when SMS app is not running in the background

How to import an asset in swift using Bundle.main.path() in a react-native native module

Failed to listen on localhost:8000 (reason: Cannot assign requested address)

Spring Boot JPA PostgreSQL Web App - Internal Authentication Error

ngClass error (Can't bind ngClass since it isn't a known property of div) in Angular 11.0.3

Using Response.Redirect with Friendly URLS in ASP.NET

Can a 32-bit antivirus program protect you from 64-bit threats

Double spacing in rmarkdown pdf

How to fix "pickle_module.load(f, **pickle_load_args) _pickle.UnpicklingError: invalid load key, '<'" using YOLOv3?

3D Touch Peek Swipe Like Mail

Bootstrap 5 Static Modal Still Closes when I Click Outside

Assembly definition can't resolve namespaces from external packages

Vector input in shiny R and then use it

Emulator wrong screen resolution in Android Studio 1.3

Svchost high CPU from Microsoft.BingWeather app errors

Graphics Context misaligned on first paint

Python connect to firebird docker database

Is this docker-for-mac password dialog legit?

How to save models trained locally in Amazon SageMaker?