PyYAML 将字符串解释为时间戳
PyYAML interprets string as timestamp
看起来 PyYAML 似乎将字符串 10:01 解释为以秒为单位的持续时间:
import yaml
>>> yaml.load("time: 10:01")
{'time': 601}
官方文档没有反映:PyYAML documentation
关于如何将 10:01 读取为字符串的任何建议?
用引号引起来:
>>> import yaml
>>> yaml.load('time: "10:01"')
{'time': '10:01'}
这会告诉 YAML 它是一个文字字符串,并阻止将其视为数值的尝试。
如果您希望对 pyyaml 库进行 monkeypatch,使其不具有此行为(因为没有巧妙的方法来执行此操作),对于您选择的解析器,下面的代码有效。问题是 the regex that is used for int
includes some code to match timestamps 尽管看起来没有针对此行为的规范,但对于像 30:00
或 40:11:11:11:11
这样的字符串,它只是被视为 "good practice" 被视为整数。
import yaml
import re
def partition_list(somelist, predicate):
truelist = []
falselist = []
for item in somelist:
if predicate(item):
truelist.append(item)
else:
falselist.append(item)
return truelist, falselist
@classmethod
def init_implicit_resolvers(cls):
"""
creates own copy of yaml_implicit_resolvers from superclass
code taken from add_implicit_resolvers; this should be refactored elsewhere
"""
if not 'yaml_implicit_resolvers' in cls.__dict__:
implicit_resolvers = {}
for key in cls.yaml_implicit_resolvers:
implicit_resolvers[key] = cls.yaml_implicit_resolvers[key][:]
cls.yaml_implicit_resolvers = implicit_resolvers
@classmethod
def remove_implicit_resolver(cls, tag, verbose=False):
cls.init_implicit_resolvers()
removed = {}
for key in cls.yaml_implicit_resolvers:
v = cls.yaml_implicit_resolvers[key]
vremoved, v2 = partition_list(v, lambda x: x[0] == tag)
if vremoved:
cls.yaml_implicit_resolvers[key] = v2
removed[key] = vremoved
return removed
@classmethod
def _monkeypatch_fix_int_no_timestamp(cls):
bad = '|[-+]?[1-9][0-9_]*(?::[0-5]?[0-9])+'
for key in cls.yaml_implicit_resolvers:
v = cls.yaml_implicit_resolvers[key]
vcopy = v[:]
n = 0
for k in xrange(len(v)):
if v[k][0] == 'tag:yaml.org,2002:int' and bad in v[k][1].pattern:
n += 1
p = v[k][1]
p2 = re.compile(p.pattern.replace(bad,''), p.flags)
vcopy[k] = (v[k][0], p2)
if n > 0:
cls.yaml_implicit_resolvers[key] = vcopy
yaml.resolver.Resolver.init_implicit_resolvers = init_implicit_resolvers
yaml.resolver.Resolver.remove_implicit_resolver = remove_implicit_resolver
yaml.resolver.Resolver._monkeypatch_fix_int_no_timestamp = _monkeypatch_fix_int_no_timestamp
那么如果你这样做:
class MyResolver(yaml.resolver.Resolver):
pass
t1 = MyResolver.remove_implicit_resolver('tag:yaml.org,2002:timestamp')
MyResolver._monkeypatch_fix_int_no_timestamp()
class MyLoader(yaml.SafeLoader, MyResolver):
pass
text = '''
a: 3
b: 30:00
c: 30z
d: 40:11:11:11
'''
print yaml.safe_load(text)
print yaml.load(text, Loader=MyLoader)
然后打印
{'a': 3, 'c': '30z', 'b': 1800, 'd': 8680271}
{'a': 3, 'c': '30z', 'b': '30:00', 'd': '40:11:11:11'}
表明默认的 yaml 行为保持不变,但您的私有加载器 class 可以正常处理这些字符串。
由于您使用的是 YAML 1.1 的解析器,因此您应该期望实现 specification(示例 2.19)中指示的内容:
sexagesimal: 3:25:45
进一步解释六十进制here:
Using “:” allows expressing integers in base 60, which is convenient for time and angle values.
并非 PyYAML 中实现的每个细节都在您引用的文档中,您应该只将其视为介绍。
您不是唯一发现这种解释令人困惑的人,在 YAML 1.2 中,六十进制已从 specification 中删除。尽管该规范已经发布了大约八年,但 PyYAML 中从未实施过更改。
解决此问题的最简单方法是升级到 ruamel.yaml(免责声明:我是该软件包的作者),您将获得 YAML 1.2 行为(除非您明确指定要使用 YAML 1.1) 将 10:01
解释为字符串:
from ruamel import yaml
import warnings
warnings.simplefilter('ignore', yaml.error.UnsafeLoaderWarning)
data = yaml.load("time: 10:01")
print(data)
给出:
{'time': '10:01'}
warnings.filter 是必需的,因为您使用 .load()
而不是 .safe_load()
。前者 不安全 并且可能导致磁盘擦除,或者更糟,当用于不受控制的 YAML 输入时。很少有理由不使用 .safe_load()
.
看起来 PyYAML 似乎将字符串 10:01 解释为以秒为单位的持续时间:
import yaml
>>> yaml.load("time: 10:01")
{'time': 601}
官方文档没有反映:PyYAML documentation
关于如何将 10:01 读取为字符串的任何建议?
用引号引起来:
>>> import yaml
>>> yaml.load('time: "10:01"')
{'time': '10:01'}
这会告诉 YAML 它是一个文字字符串,并阻止将其视为数值的尝试。
如果您希望对 pyyaml 库进行 monkeypatch,使其不具有此行为(因为没有巧妙的方法来执行此操作),对于您选择的解析器,下面的代码有效。问题是 the regex that is used for int
includes some code to match timestamps 尽管看起来没有针对此行为的规范,但对于像 30:00
或 40:11:11:11:11
这样的字符串,它只是被视为 "good practice" 被视为整数。
import yaml
import re
def partition_list(somelist, predicate):
truelist = []
falselist = []
for item in somelist:
if predicate(item):
truelist.append(item)
else:
falselist.append(item)
return truelist, falselist
@classmethod
def init_implicit_resolvers(cls):
"""
creates own copy of yaml_implicit_resolvers from superclass
code taken from add_implicit_resolvers; this should be refactored elsewhere
"""
if not 'yaml_implicit_resolvers' in cls.__dict__:
implicit_resolvers = {}
for key in cls.yaml_implicit_resolvers:
implicit_resolvers[key] = cls.yaml_implicit_resolvers[key][:]
cls.yaml_implicit_resolvers = implicit_resolvers
@classmethod
def remove_implicit_resolver(cls, tag, verbose=False):
cls.init_implicit_resolvers()
removed = {}
for key in cls.yaml_implicit_resolvers:
v = cls.yaml_implicit_resolvers[key]
vremoved, v2 = partition_list(v, lambda x: x[0] == tag)
if vremoved:
cls.yaml_implicit_resolvers[key] = v2
removed[key] = vremoved
return removed
@classmethod
def _monkeypatch_fix_int_no_timestamp(cls):
bad = '|[-+]?[1-9][0-9_]*(?::[0-5]?[0-9])+'
for key in cls.yaml_implicit_resolvers:
v = cls.yaml_implicit_resolvers[key]
vcopy = v[:]
n = 0
for k in xrange(len(v)):
if v[k][0] == 'tag:yaml.org,2002:int' and bad in v[k][1].pattern:
n += 1
p = v[k][1]
p2 = re.compile(p.pattern.replace(bad,''), p.flags)
vcopy[k] = (v[k][0], p2)
if n > 0:
cls.yaml_implicit_resolvers[key] = vcopy
yaml.resolver.Resolver.init_implicit_resolvers = init_implicit_resolvers
yaml.resolver.Resolver.remove_implicit_resolver = remove_implicit_resolver
yaml.resolver.Resolver._monkeypatch_fix_int_no_timestamp = _monkeypatch_fix_int_no_timestamp
那么如果你这样做:
class MyResolver(yaml.resolver.Resolver):
pass
t1 = MyResolver.remove_implicit_resolver('tag:yaml.org,2002:timestamp')
MyResolver._monkeypatch_fix_int_no_timestamp()
class MyLoader(yaml.SafeLoader, MyResolver):
pass
text = '''
a: 3
b: 30:00
c: 30z
d: 40:11:11:11
'''
print yaml.safe_load(text)
print yaml.load(text, Loader=MyLoader)
然后打印
{'a': 3, 'c': '30z', 'b': 1800, 'd': 8680271}
{'a': 3, 'c': '30z', 'b': '30:00', 'd': '40:11:11:11'}
表明默认的 yaml 行为保持不变,但您的私有加载器 class 可以正常处理这些字符串。
由于您使用的是 YAML 1.1 的解析器,因此您应该期望实现 specification(示例 2.19)中指示的内容:
sexagesimal: 3:25:45
进一步解释六十进制here:
Using “:” allows expressing integers in base 60, which is convenient for time and angle values.
并非 PyYAML 中实现的每个细节都在您引用的文档中,您应该只将其视为介绍。
您不是唯一发现这种解释令人困惑的人,在 YAML 1.2 中,六十进制已从 specification 中删除。尽管该规范已经发布了大约八年,但 PyYAML 中从未实施过更改。
解决此问题的最简单方法是升级到 ruamel.yaml(免责声明:我是该软件包的作者),您将获得 YAML 1.2 行为(除非您明确指定要使用 YAML 1.1) 将 10:01
解释为字符串:
from ruamel import yaml
import warnings
warnings.simplefilter('ignore', yaml.error.UnsafeLoaderWarning)
data = yaml.load("time: 10:01")
print(data)
给出:
{'time': '10:01'}
warnings.filter 是必需的,因为您使用 .load()
而不是 .safe_load()
。前者 不安全 并且可能导致磁盘擦除,或者更糟,当用于不受控制的 YAML 输入时。很少有理由不使用 .safe_load()
.