PyYAML 将字符串解释为时间戳

PyYAML interprets string as timestamp

看起来 PyYAML 似乎将字符串 10:01 解释为以秒为单位的持续时间:

import yaml
>>> yaml.load("time: 10:01")
{'time': 601}

官方文档没有反映:PyYAML documentation

关于如何将 10:01 读取为字符串的任何建议?

用引号引起来:

>>> import yaml
>>> yaml.load('time: "10:01"')
{'time': '10:01'}

这会告诉 YAML 它是一个文字字符串,并阻止将其视为数值的尝试。

如果您希望对 pyyaml 库进行 monkeypatch,使其不具有此行为(因为没有巧妙的方法来执行此操作),对于您选择的解析器,下面的代码有效。问题是 the regex that is used for int includes some code to match timestamps 尽管看起来没有针对此行为的规范,但对于像 30:0040:11:11:11:11 这样的字符串,它只是被视为 "good practice" 被视为整数。

import yaml
import re

def partition_list(somelist, predicate):
    truelist = []
    falselist = []
    for item in somelist:
        if predicate(item):
            truelist.append(item)
        else:
            falselist.append(item)
    return truelist, falselist

@classmethod
def init_implicit_resolvers(cls):
    """ 
    creates own copy of yaml_implicit_resolvers from superclass
    code taken from add_implicit_resolvers; this should be refactored elsewhere
    """
    if not 'yaml_implicit_resolvers' in cls.__dict__:
        implicit_resolvers = {}
        for key in cls.yaml_implicit_resolvers:
            implicit_resolvers[key] = cls.yaml_implicit_resolvers[key][:]
        cls.yaml_implicit_resolvers = implicit_resolvers

@classmethod
def remove_implicit_resolver(cls, tag, verbose=False):
    cls.init_implicit_resolvers()
    removed = {}
    for key in cls.yaml_implicit_resolvers:
        v = cls.yaml_implicit_resolvers[key]
        vremoved, v2 = partition_list(v, lambda x: x[0] == tag)
        if vremoved:
            cls.yaml_implicit_resolvers[key] = v2
            removed[key] = vremoved
    return removed

@classmethod
def _monkeypatch_fix_int_no_timestamp(cls):
    bad = '|[-+]?[1-9][0-9_]*(?::[0-5]?[0-9])+'
    for key in cls.yaml_implicit_resolvers:
        v = cls.yaml_implicit_resolvers[key]
        vcopy = v[:]
        n = 0
        for k in xrange(len(v)):
            if v[k][0] == 'tag:yaml.org,2002:int' and bad in v[k][1].pattern:
                n += 1
                p = v[k][1]
                p2 = re.compile(p.pattern.replace(bad,''), p.flags)
                vcopy[k] = (v[k][0], p2)    
        if n > 0:
            cls.yaml_implicit_resolvers[key] = vcopy

yaml.resolver.Resolver.init_implicit_resolvers = init_implicit_resolvers
yaml.resolver.Resolver.remove_implicit_resolver = remove_implicit_resolver
yaml.resolver.Resolver._monkeypatch_fix_int_no_timestamp = _monkeypatch_fix_int_no_timestamp

那么如果你这样做:

class MyResolver(yaml.resolver.Resolver):
    pass

t1 = MyResolver.remove_implicit_resolver('tag:yaml.org,2002:timestamp')
MyResolver._monkeypatch_fix_int_no_timestamp()

class MyLoader(yaml.SafeLoader, MyResolver):
    pass

text = '''
a: 3
b: 30:00
c: 30z
d: 40:11:11:11
'''

print yaml.safe_load(text)
print yaml.load(text, Loader=MyLoader)

然后打印

{'a': 3, 'c': '30z', 'b': 1800, 'd': 8680271}
{'a': 3, 'c': '30z', 'b': '30:00', 'd': '40:11:11:11'}

表明默认的 yaml 行为保持不变,但您的私有加载器 class 可以正常处理这些字符串。

由于您使用的是 YAML 1.1 的解析器,因此您应该期望实现 specification(示例 2.19)中指示的内容:

sexagesimal: 3:25:45

进一步解释六十进制here:

Using “:” allows expressing integers in base 60, which is convenient for time and angle values.

并非 PyYAML 中实现的每个细节都在您引用的文档中,您应该只将其视为介绍。


您不是唯一发现这种解释令人困惑的人,在 YAML 1.2 中,六十进制已从 specification 中删除。尽管该规范已经发布了大约八年,但 PyYAML 中从未实施过更改。

解决此问题的最简单方法是升级到 ruamel.yaml(免责声明:我是该软件包的作者),您将获得 YAML 1.2 行为(除非您明确指定要使用 YAML 1.1) 将 10:01 解释为字符串:

from ruamel import yaml

import warnings
warnings.simplefilter('ignore', yaml.error.UnsafeLoaderWarning)

data = yaml.load("time: 10:01")
print(data)

给出:

{'time': '10:01'}

warnings.filter 是必需的,因为您使用 .load() 而不是 .safe_load()。前者 不安全 并且可能导致磁盘擦除,或者更糟,当用于不受控制的 YAML 输入时。很少有理由不使用 .safe_load().