如何使用 CommentedMap 递归排序 YAML(带锚点)?
How to recursively sort YAML (with anchors) using CommentedMap?
我遇到了提出的递归排序解决方案的问题
我无法使用锚点和子元素对 YAML 文件进行排序。 .pop 方法调用抛出 KeyError 异常。
例如:
volvo:
anchor_struct: &anchor_struct
zzz:
val: "bar"
aaa:
val: "foo"
aaa: "Authorization"
zzz: 341
anchr_val: &anchor_val famous_val
lambo:
<<: *anchor_struct
mykey:
myval:
enabled: false
anchor_struct:
<<: *anchor_struct
username: user
anchor_val: *anchor_val
zzz: zorglub
www: web
File "orderYaml.py", line 36, in recursive_sort_mappings
value = s.pop(key)
File "/usr/local/lib/python3.6/dist-packages/ruamel/yaml/comments.py", line 818, in __delitem__
referer.update_key_value(key)
File "/usr/local/lib/python3.6/dist-packages/ruamel/yaml/comments.py", line 947, in update_key_value
ordereddict.__delitem__(self, key)
KeyError: 'aaa'
当 YAML 文件包含锚元素中的额外元素时,会出现此错误,如下所示
volvo:
anchor_struct: &anchor_struct
extra:
zzz:
val: "bar"
aaa:
val: "foo"
aaa: "Authorization"
zzz: 341
anchr_val: &anchor_val famous_val
lambo:
<<: *anchor_struct
mykey:
myval:
enabled: false
anchor_struct:
<<: *anchor_struct
username: user
anchor_val: *anchor_val
zzz: zorglub
www: web
锦上添花:有没有一种方法可以在排序后将锚定定义 (&...) 保留在“volvo”元素上,因为我想操纵排序结果以始终保留“volvo” " 元素在处理后位于顶部。
我的目标是按以下排序找到此文件:
lambo:
<<: *anchor_struct
anchor_struct:
<<: *anchor_struct
mykey:
myval:
enabled: false
username: user
anchor_val: *anchor_val
www: web
zzz: zorglub
volvo:
aaa: "Authorization"
anchor_struct: &anchor_struct
aaa:
val: "foo"
zzz:
val: "bar"
anchr_val: &anchor_val famous_val
zzz: 341
您看到其他解决方案吗?我的目标是验证我们所有的 YAML 文件是否遵守字母顺序。
编辑#1:
这是我试图达到的另一个例子。
- 我只希望在顶级元素“_world”中包含属性“&”的元素
- 最多有 30 个不同的属性为“&”的值
- 顶部元素“world”将使用前缀“”明确命名,以始终位于顶部
- 其他根元素将使用对锚点的引用(通过“<<: *”)
- 输出不得添加行或属性
- 输出一定不能修改属性
- 输出必须对所有元素及其子元素(数组除外)进行排序
这是预期的 input/output 示例:
输入
_world:
anchor_struct: &anchor_struct
foo:
val: "foo"
bar:
val: "foo"
string: "string"
newmsg: &newmsg
msg: "msg"
foo: "foo"
new: "new"
anchr_val: &anchor_val famous_val
bool: True
elem2:
myStruct:
<<: *anchor_struct
anchor_val: *anchor_val
<<: *anchor_struct
zzz: zorglub
www: web
anchor_struct:
<<: *anchor_struct
other_elem: "other_elem"
elem1:
<<: *anchor_struct
zzz: zorglub
newmsg:
<<: *newmsg
msg: "msg2"
myStruct:
<<: *anchor_struct
anchor_struct:
second_elem: "second_elem"
<<: *anchor_struct
other_elem: "other_elem"
www: web
anchor_val: *anchor_val
预期输出
_world:
anchor_struct: &anchor_struct
bar:
val: "foo"
foo:
val: "foo"
anchr_val: &anchor_val famous_val
bool: True
newmsg: &newmsg
foo: "foo"
msg: "msg"
new: "new"
string: "string"
elem1:
<<: *anchor_struct
anchor_struct:
<<: *anchor_struct
other_elem: "other_elem"
second_elem: "second_elem"
anchor_val: *anchor_val
myStruct:
<<: *anchor_struct
newmsg:
<<: *newmsg
msg: "msg2"
www: web
zzz: zorglub
elem2:
<<: *anchor_struct
anchor_struct:
<<: *anchor_struct
other_elem: "other_elem"
anchor_val: *anchor_val
myStruct:
<<: *anchor_struct
www: web
zzz: zorglub
我解决这类问题的方法是首先添加
预期和必要的进口,将输入和预期输出定义为
多行字符串,并向 YAML 实例添加一个有用的 diff
方法。
在测试时,字符串输入比文件更容易处理,因为一切都在
一个文件(需要删除一些尾随空格?)并且您无法覆盖您的
输入并开始下一个 运行 与第一个不同的内容。
import sys
import difflib
import ruamel.yaml
from ruamel.yaml.comments import merge_attrib
yaml_in = """\
_world:
anchor_struct: &anchor_struct
foo:
val: "foo"
bar:
val: "foo"
string: "string"
newmsg: &newmsg
msg: "msg"
foo: "foo"
new: "new"
anchr_val: &anchor_val famous_val
bool: True
elem2:
myStruct:
<<: *anchor_struct
anchor_val: *anchor_val
<<: *anchor_struct
zzz: zorglub
www: web
anchor_struct:
<<: *anchor_struct
other_elem: "other_elem"
elem1:
<<: *anchor_struct
zzz: zorglub
newmsg:
<<: *newmsg
msg: "msg2"
myStruct:
<<: *anchor_struct
anchor_struct:
second_elem: "second_elem"
<<: *anchor_struct
other_elem: "other_elem"
www: web
anchor_val: *anchor_val
"""
yaml_out = """\
_world:
anchor_struct: &anchor_struct
bar:
val: "foo"
foo:
val: "foo"
anchr_val: &anchor_val famous_val
bool: True
newmsg: &newmsg
foo: "foo"
msg: "msg"
new: "new"
string: "string"
elem1:
<<: *anchor_struct
anchor_struct:
<<: *anchor_struct
other_elem: "other_elem"
second_elem: "second_elem"
anchor_val: *anchor_val
myStruct:
<<: *anchor_struct
newmsg:
<<: *newmsg
msg: "msg2"
www: web
zzz: zorglub
elem2:
<<: *anchor_struct
anchor_struct:
<<: *anchor_struct
other_elem: "other_elem"
anchor_val: *anchor_val
myStruct:
<<: *anchor_struct
www: web
zzz: zorglub
"""
def diff_yaml(self, data, s, fnin="in", fnout="out"):
# dump data if necessary and compare with s
inl = [l.rstrip() + '\n' for l in s.splitlines()] # trailing space at end of line disregarded
if not isinstance(data, str):
buf = ruamel.yaml.compat.StringIO()
self.dump(data, buf)
outl = buf.getvalue().splitlines(True)
else:
outl = [l.rstrip() + '\n' for l in data.splitlines()]
diff = difflib.unified_diff(inl, outl, fnin, fnout)
result = True
for line in diff:
sys.stdout.write(line)
result = False
return result
ruamel.yaml.YAML.diff = diff_yaml
yaml = ruamel.yaml.YAML()
# yaml.indent(mapping=4, sequence=4, offset=2)
yaml.boolean_representation = ["False", "True"]
yaml.preserve_quotes = True
然后确保您的预期输出有效,并且可以往返:
dout = yaml.load(yaml_out)
buf = ruamel.yaml.compat.StringIO()
yaml.dump(dout, buf)
assert yaml.diff(dout, yaml_out)
不应给出输出或断言错误(尾随
预期输出中的空格,以及非默认的 True
布尔值)。如果预期输出无法往返,ruamel.yaml 可能无法转储您的预期输出。
如果您遇到困难,现在可以检查 dout
以确定您的解析输入应该是什么样子。
所以现在试试
def recursive_sort_mappings(s):
if isinstance(s, list):
for elem in s:
recursive_sort_mappings(elem)
return
if not isinstance(s, dict):
return
for key in sorted(s, reverse=True):
value = s.pop(key)
recursive_sort_mappings(value)
s.insert(0, key, value)
din = yaml.load(yaml_in)
recursive_sort_mappings(din)
yaml.diff(din, yaml_out)
这给出了相当多的输出,因为 recursive_sort_mappings
不知道
关于所有键的合并和 运行s,尝试将合并键保持在其原始位置,另外在弹出键时(在将其重新插入
第一个位置),如果弹出的值存在于合并的映射中,会做一些魔术:
--- in
+++ out
@@ -1,8 +1,8 @@
_world:
anchor_struct: &anchor_struct
- bar:
+ bar: &id001
val: "foo"
- foo:
+ foo: &id002
val: "foo"
anchr_val: &anchor_val famous_val
bool: True
@@ -14,24 +14,38 @@
elem1:
<<: *anchor_struct
anchor_struct:
+ bar: *id001
<<: *anchor_struct
+ foo: *id002
other_elem: "other_elem"
second_elem: "second_elem"
anchor_val: *anchor_val
+ bar: *id001
+ foo: *id002
myStruct:
<<: *anchor_struct
+ bar: *id001
+ foo: *id002
newmsg:
<<: *newmsg
+ foo: "foo"
msg: "msg2"
+ new: "new"
www: web
zzz: zorglub
elem2:
- <<: *anchor_struct
anchor_struct:
<<: *anchor_struct
+ bar: *id001
+ foo: *id002
other_elem: "other_elem"
anchor_val: *anchor_val
+ <<: *anchor_struct
+ bar: *id001
+ foo: *id002
myStruct:
<<: *anchor_struct
+ bar: *id001
+ foo: *id002
www: web
zzz: zorglub
要解决这个问题,您需要做很多事情。首先你需要放弃.insert(),它模拟(为Python3内置OrderedDict
)方法定义C ordereddict包ruamel.ordereddict。该仿真重新创建了 OrderedDict 和
这会导致重复。 Python3 C 实现,功能较弱(比 .insert()
),但在这种情况下很有用
方法 move_to_end
(可用于更新 ruamel.yaml 中的 .insert()
仿真)。
其次,您只需遍历“真实”键,而不是合并提供的那些键,因此您不能使用 for key in
。
第三,如果它在其他地方,你需要合并键移动到映射的顶部。
(为调试目的添加了 level
参数)
def recursive_sort_mappings(s, level=0):
if isinstance(s, list):
for elem in s:
recursive_sort_mappings(elem, level=level+1)
return
if not isinstance(s, dict):
return
merge = getattr(s, merge_attrib, [None])[0]
if merge is not None and merge[0] != 0: # << not in first position, move it
setattr(s, merge_attrib, [(0, merge[1])])
for key in sorted(s._ok): # _ok -> set of Own Keys, i.e. not merged in keys
value = s[key]
# print('v1', level, key, super(ruamel.yaml.comments.CommentedMap, s).keys())
recursive_sort_mappings(value, level=level+1)
# print('v2', level, key, super(ruamel.yaml.comments.CommentedMap, s).keys())
s.move_to_end(key)
din = yaml.load(yaml_in)
recursive_sort_mappings(din)
assert yaml.diff(din, yaml_out)
然后 diff 不再给出输出。
我遇到了提出的递归排序解决方案的问题
我无法使用锚点和子元素对 YAML 文件进行排序。 .pop 方法调用抛出 KeyError 异常。
例如:
volvo:
anchor_struct: &anchor_struct
zzz:
val: "bar"
aaa:
val: "foo"
aaa: "Authorization"
zzz: 341
anchr_val: &anchor_val famous_val
lambo:
<<: *anchor_struct
mykey:
myval:
enabled: false
anchor_struct:
<<: *anchor_struct
username: user
anchor_val: *anchor_val
zzz: zorglub
www: web
File "orderYaml.py", line 36, in recursive_sort_mappings
value = s.pop(key)
File "/usr/local/lib/python3.6/dist-packages/ruamel/yaml/comments.py", line 818, in __delitem__
referer.update_key_value(key)
File "/usr/local/lib/python3.6/dist-packages/ruamel/yaml/comments.py", line 947, in update_key_value
ordereddict.__delitem__(self, key)
KeyError: 'aaa'
当 YAML 文件包含锚元素中的额外元素时,会出现此错误,如下所示
volvo:
anchor_struct: &anchor_struct
extra:
zzz:
val: "bar"
aaa:
val: "foo"
aaa: "Authorization"
zzz: 341
anchr_val: &anchor_val famous_val
lambo:
<<: *anchor_struct
mykey:
myval:
enabled: false
anchor_struct:
<<: *anchor_struct
username: user
anchor_val: *anchor_val
zzz: zorglub
www: web
锦上添花:有没有一种方法可以在排序后将锚定定义 (&...) 保留在“volvo”元素上,因为我想操纵排序结果以始终保留“volvo” " 元素在处理后位于顶部。
我的目标是按以下排序找到此文件:
lambo:
<<: *anchor_struct
anchor_struct:
<<: *anchor_struct
mykey:
myval:
enabled: false
username: user
anchor_val: *anchor_val
www: web
zzz: zorglub
volvo:
aaa: "Authorization"
anchor_struct: &anchor_struct
aaa:
val: "foo"
zzz:
val: "bar"
anchr_val: &anchor_val famous_val
zzz: 341
您看到其他解决方案吗?我的目标是验证我们所有的 YAML 文件是否遵守字母顺序。
编辑#1:
这是我试图达到的另一个例子。
- 我只希望在顶级元素“_world”中包含属性“&”的元素
- 最多有 30 个不同的属性为“&”的值
- 顶部元素“world”将使用前缀“”明确命名,以始终位于顶部
- 其他根元素将使用对锚点的引用(通过“<<: *”)
- 输出不得添加行或属性
- 输出一定不能修改属性
- 输出必须对所有元素及其子元素(数组除外)进行排序
这是预期的 input/output 示例:
输入
_world:
anchor_struct: &anchor_struct
foo:
val: "foo"
bar:
val: "foo"
string: "string"
newmsg: &newmsg
msg: "msg"
foo: "foo"
new: "new"
anchr_val: &anchor_val famous_val
bool: True
elem2:
myStruct:
<<: *anchor_struct
anchor_val: *anchor_val
<<: *anchor_struct
zzz: zorglub
www: web
anchor_struct:
<<: *anchor_struct
other_elem: "other_elem"
elem1:
<<: *anchor_struct
zzz: zorglub
newmsg:
<<: *newmsg
msg: "msg2"
myStruct:
<<: *anchor_struct
anchor_struct:
second_elem: "second_elem"
<<: *anchor_struct
other_elem: "other_elem"
www: web
anchor_val: *anchor_val
预期输出
_world:
anchor_struct: &anchor_struct
bar:
val: "foo"
foo:
val: "foo"
anchr_val: &anchor_val famous_val
bool: True
newmsg: &newmsg
foo: "foo"
msg: "msg"
new: "new"
string: "string"
elem1:
<<: *anchor_struct
anchor_struct:
<<: *anchor_struct
other_elem: "other_elem"
second_elem: "second_elem"
anchor_val: *anchor_val
myStruct:
<<: *anchor_struct
newmsg:
<<: *newmsg
msg: "msg2"
www: web
zzz: zorglub
elem2:
<<: *anchor_struct
anchor_struct:
<<: *anchor_struct
other_elem: "other_elem"
anchor_val: *anchor_val
myStruct:
<<: *anchor_struct
www: web
zzz: zorglub
我解决这类问题的方法是首先添加
预期和必要的进口,将输入和预期输出定义为
多行字符串,并向 YAML 实例添加一个有用的 diff
方法。
在测试时,字符串输入比文件更容易处理,因为一切都在 一个文件(需要删除一些尾随空格?)并且您无法覆盖您的 输入并开始下一个 运行 与第一个不同的内容。
import sys
import difflib
import ruamel.yaml
from ruamel.yaml.comments import merge_attrib
yaml_in = """\
_world:
anchor_struct: &anchor_struct
foo:
val: "foo"
bar:
val: "foo"
string: "string"
newmsg: &newmsg
msg: "msg"
foo: "foo"
new: "new"
anchr_val: &anchor_val famous_val
bool: True
elem2:
myStruct:
<<: *anchor_struct
anchor_val: *anchor_val
<<: *anchor_struct
zzz: zorglub
www: web
anchor_struct:
<<: *anchor_struct
other_elem: "other_elem"
elem1:
<<: *anchor_struct
zzz: zorglub
newmsg:
<<: *newmsg
msg: "msg2"
myStruct:
<<: *anchor_struct
anchor_struct:
second_elem: "second_elem"
<<: *anchor_struct
other_elem: "other_elem"
www: web
anchor_val: *anchor_val
"""
yaml_out = """\
_world:
anchor_struct: &anchor_struct
bar:
val: "foo"
foo:
val: "foo"
anchr_val: &anchor_val famous_val
bool: True
newmsg: &newmsg
foo: "foo"
msg: "msg"
new: "new"
string: "string"
elem1:
<<: *anchor_struct
anchor_struct:
<<: *anchor_struct
other_elem: "other_elem"
second_elem: "second_elem"
anchor_val: *anchor_val
myStruct:
<<: *anchor_struct
newmsg:
<<: *newmsg
msg: "msg2"
www: web
zzz: zorglub
elem2:
<<: *anchor_struct
anchor_struct:
<<: *anchor_struct
other_elem: "other_elem"
anchor_val: *anchor_val
myStruct:
<<: *anchor_struct
www: web
zzz: zorglub
"""
def diff_yaml(self, data, s, fnin="in", fnout="out"):
# dump data if necessary and compare with s
inl = [l.rstrip() + '\n' for l in s.splitlines()] # trailing space at end of line disregarded
if not isinstance(data, str):
buf = ruamel.yaml.compat.StringIO()
self.dump(data, buf)
outl = buf.getvalue().splitlines(True)
else:
outl = [l.rstrip() + '\n' for l in data.splitlines()]
diff = difflib.unified_diff(inl, outl, fnin, fnout)
result = True
for line in diff:
sys.stdout.write(line)
result = False
return result
ruamel.yaml.YAML.diff = diff_yaml
yaml = ruamel.yaml.YAML()
# yaml.indent(mapping=4, sequence=4, offset=2)
yaml.boolean_representation = ["False", "True"]
yaml.preserve_quotes = True
然后确保您的预期输出有效,并且可以往返:
dout = yaml.load(yaml_out)
buf = ruamel.yaml.compat.StringIO()
yaml.dump(dout, buf)
assert yaml.diff(dout, yaml_out)
不应给出输出或断言错误(尾随
预期输出中的空格,以及非默认的 True
布尔值)。如果预期输出无法往返,ruamel.yaml 可能无法转储您的预期输出。
如果您遇到困难,现在可以检查 dout
以确定您的解析输入应该是什么样子。
所以现在试试
def recursive_sort_mappings(s):
if isinstance(s, list):
for elem in s:
recursive_sort_mappings(elem)
return
if not isinstance(s, dict):
return
for key in sorted(s, reverse=True):
value = s.pop(key)
recursive_sort_mappings(value)
s.insert(0, key, value)
din = yaml.load(yaml_in)
recursive_sort_mappings(din)
yaml.diff(din, yaml_out)
这给出了相当多的输出,因为 recursive_sort_mappings
不知道
关于所有键的合并和 运行s,尝试将合并键保持在其原始位置,另外在弹出键时(在将其重新插入
第一个位置),如果弹出的值存在于合并的映射中,会做一些魔术:
--- in
+++ out
@@ -1,8 +1,8 @@
_world:
anchor_struct: &anchor_struct
- bar:
+ bar: &id001
val: "foo"
- foo:
+ foo: &id002
val: "foo"
anchr_val: &anchor_val famous_val
bool: True
@@ -14,24 +14,38 @@
elem1:
<<: *anchor_struct
anchor_struct:
+ bar: *id001
<<: *anchor_struct
+ foo: *id002
other_elem: "other_elem"
second_elem: "second_elem"
anchor_val: *anchor_val
+ bar: *id001
+ foo: *id002
myStruct:
<<: *anchor_struct
+ bar: *id001
+ foo: *id002
newmsg:
<<: *newmsg
+ foo: "foo"
msg: "msg2"
+ new: "new"
www: web
zzz: zorglub
elem2:
- <<: *anchor_struct
anchor_struct:
<<: *anchor_struct
+ bar: *id001
+ foo: *id002
other_elem: "other_elem"
anchor_val: *anchor_val
+ <<: *anchor_struct
+ bar: *id001
+ foo: *id002
myStruct:
<<: *anchor_struct
+ bar: *id001
+ foo: *id002
www: web
zzz: zorglub
要解决这个问题,您需要做很多事情。首先你需要放弃.insert(),它模拟(为Python3内置OrderedDict
)方法定义C ordereddict包ruamel.ordereddict。该仿真重新创建了 OrderedDict 和
这会导致重复。 Python3 C 实现,功能较弱(比 .insert()
),但在这种情况下很有用
方法 move_to_end
(可用于更新 ruamel.yaml 中的 .insert()
仿真)。
其次,您只需遍历“真实”键,而不是合并提供的那些键,因此您不能使用 for key in
。
第三,如果它在其他地方,你需要合并键移动到映射的顶部。
(为调试目的添加了 level
参数)
def recursive_sort_mappings(s, level=0):
if isinstance(s, list):
for elem in s:
recursive_sort_mappings(elem, level=level+1)
return
if not isinstance(s, dict):
return
merge = getattr(s, merge_attrib, [None])[0]
if merge is not None and merge[0] != 0: # << not in first position, move it
setattr(s, merge_attrib, [(0, merge[1])])
for key in sorted(s._ok): # _ok -> set of Own Keys, i.e. not merged in keys
value = s[key]
# print('v1', level, key, super(ruamel.yaml.comments.CommentedMap, s).keys())
recursive_sort_mappings(value, level=level+1)
# print('v2', level, key, super(ruamel.yaml.comments.CommentedMap, s).keys())
s.move_to_end(key)
din = yaml.load(yaml_in)
recursive_sort_mappings(din)
assert yaml.diff(din, yaml_out)
然后 diff 不再给出输出。