如何使用 CommentedMap 递归排序 YAML(带锚点)?

How to recursively sort YAML (with anchors) using CommentedMap?

我遇到了提出的递归排序解决方案的问题

我无法使用锚点和子元素对 YAML 文件进行排序。 .pop 方法调用抛出 KeyError 异常。

例如:

volvo:
  anchor_struct: &anchor_struct
    zzz:
      val: "bar"
    aaa:
      val: "foo"
  aaa: "Authorization"
  zzz: 341
  anchr_val: &anchor_val famous_val
  
lambo:
  <<: *anchor_struct
  mykey:
    myval:
      enabled: false
  anchor_struct:
    <<: *anchor_struct
    username: user
  anchor_val: *anchor_val
  zzz: zorglub
  www: web
  File "orderYaml.py", line 36, in recursive_sort_mappings
    value = s.pop(key)
  File "/usr/local/lib/python3.6/dist-packages/ruamel/yaml/comments.py", line 818, in __delitem__
    referer.update_key_value(key)
  File "/usr/local/lib/python3.6/dist-packages/ruamel/yaml/comments.py", line 947, in update_key_value
    ordereddict.__delitem__(self, key)
KeyError: 'aaa'

当 YAML 文件包含锚元素中的额外元素时,会出现此错误,如下所示

volvo:
  anchor_struct: &anchor_struct
    extra:
      zzz:
        val: "bar"
      aaa:
        val: "foo"
  aaa: "Authorization"
  zzz: 341
  anchr_val: &anchor_val famous_val
  
lambo:
  <<: *anchor_struct
  mykey:
    myval:
      enabled: false
  anchor_struct:
    <<: *anchor_struct
    username: user
  anchor_val: *anchor_val
  zzz: zorglub
  www: web

锦上添花:有没有一种方法可以在排序后将锚定定义 (&...) 保留在“volvo”元素上,因为我想操纵排序结果以始终保留“volvo” " 元素在处理后位于顶部。

我的目标是按以下排序找到此文件:

lambo:
  <<: *anchor_struct
  anchor_struct:
    <<: *anchor_struct
  mykey:
    myval:
      enabled: false
    username: user
  anchor_val: *anchor_val
  www: web
  zzz: zorglub

volvo:
  aaa: "Authorization"
  anchor_struct: &anchor_struct
    aaa:
      val: "foo"
    zzz:
      val: "bar"
  anchr_val: &anchor_val famous_val
  zzz: 341

您看到其他解决方案吗?我的目标是验证我们所有的 YAML 文件是否遵守字母顺序。


编辑#1:

这是我试图达到的另一个例子。

这是预期的 input/output 示例:

输入

_world:
  anchor_struct: &anchor_struct
    foo:
      val: "foo"
    bar:
      val: "foo"
  string: "string"
  newmsg: &newmsg
    msg: "msg"
    foo: "foo"
    new: "new"
  anchr_val: &anchor_val famous_val
  bool: True
elem2:
  myStruct:
    <<: *anchor_struct
  anchor_val: *anchor_val
  <<: *anchor_struct
  zzz: zorglub
  www: web
  anchor_struct:
    <<: *anchor_struct
    other_elem: "other_elem"
elem1:
  <<: *anchor_struct
  zzz: zorglub
  newmsg: 
    <<: *newmsg
    msg: "msg2"
  myStruct:
    <<: *anchor_struct
  anchor_struct:
    second_elem: "second_elem"
    <<: *anchor_struct
    other_elem: "other_elem"
  www: web
  anchor_val: *anchor_val

预期输出

_world:
  anchor_struct: &anchor_struct
    bar:
      val: "foo"
    foo:
      val: "foo"
  anchr_val: &anchor_val famous_val
  bool: True
  newmsg: &newmsg
    foo: "foo"
    msg: "msg"
    new: "new"
  string: "string"
elem1:
  <<: *anchor_struct
  anchor_struct:
    <<: *anchor_struct
    other_elem: "other_elem"
    second_elem: "second_elem"
  anchor_val: *anchor_val
  myStruct:
    <<: *anchor_struct
  newmsg: 
    <<: *newmsg
    msg: "msg2"
  www: web
  zzz: zorglub
elem2:
  <<: *anchor_struct
  anchor_struct:
    <<: *anchor_struct
    other_elem: "other_elem"
  anchor_val: *anchor_val
  myStruct:
    <<: *anchor_struct
  www: web
  zzz: zorglub

我解决这类问题的方法是首先添加 预期和必要的进口,将输入和预期输出定义为 多行字符串,并向 YAML 实例添加一个有用的 diff 方法。

在测试时,字符串输入比文件更容易处理,因为一切都在 一个文件(需要删除一些尾随空格?)并且您无法覆盖您的 输入并开始下一个 运行 与第一个不同的内容。

import sys
import difflib
import ruamel.yaml
from ruamel.yaml.comments import merge_attrib

yaml_in = """\
_world:
  anchor_struct: &anchor_struct
    foo:
      val: "foo"
    bar:
      val: "foo"
  string: "string"
  newmsg: &newmsg
    msg: "msg"
    foo: "foo"
    new: "new"
  anchr_val: &anchor_val famous_val
  bool: True
elem2:
  myStruct:
    <<: *anchor_struct
  anchor_val: *anchor_val
  <<: *anchor_struct
  zzz: zorglub
  www: web
  anchor_struct:
    <<: *anchor_struct
    other_elem: "other_elem"
elem1:
  <<: *anchor_struct
  zzz: zorglub
  newmsg: 
    <<: *newmsg
    msg: "msg2"
  myStruct:
    <<: *anchor_struct
  anchor_struct:
    second_elem: "second_elem"
    <<: *anchor_struct
    other_elem: "other_elem"
  www: web
  anchor_val: *anchor_val
"""

yaml_out = """\
_world:
  anchor_struct: &anchor_struct
    bar:
      val: "foo"
    foo:
      val: "foo"
  anchr_val: &anchor_val famous_val
  bool: True
  newmsg: &newmsg
    foo: "foo"
    msg: "msg"
    new: "new"
  string: "string"
elem1:
  <<: *anchor_struct
  anchor_struct:
    <<: *anchor_struct
    other_elem: "other_elem"
    second_elem: "second_elem"
  anchor_val: *anchor_val
  myStruct:
    <<: *anchor_struct
  newmsg: 
    <<: *newmsg
    msg: "msg2"
  www: web
  zzz: zorglub
elem2:
  <<: *anchor_struct
  anchor_struct:
    <<: *anchor_struct
    other_elem: "other_elem"
  anchor_val: *anchor_val
  myStruct:
    <<: *anchor_struct
  www: web
  zzz: zorglub
"""


def diff_yaml(self, data, s, fnin="in", fnout="out"):
    # dump data if necessary and compare with s
    inl = [l.rstrip() + '\n' for l in s.splitlines()]   # trailing space at end of line disregarded
    if not isinstance(data, str):
        buf = ruamel.yaml.compat.StringIO()
        self.dump(data, buf)
        outl = buf.getvalue().splitlines(True)
    else:
        outl = [l.rstrip() + '\n' for l in data.splitlines()]
    diff = difflib.unified_diff(inl, outl, fnin, fnout)
    result = True
    for line in diff:
        sys.stdout.write(line)
        result = False
    return result

ruamel.yaml.YAML.diff = diff_yaml

yaml = ruamel.yaml.YAML()
# yaml.indent(mapping=4, sequence=4, offset=2)
yaml.boolean_representation = ["False", "True"]
yaml.preserve_quotes = True

然后确保您的预期输出有效,并且可以往返:

dout = yaml.load(yaml_out)
buf = ruamel.yaml.compat.StringIO()
yaml.dump(dout, buf)
assert yaml.diff(dout, yaml_out)

不应给出输出或断言错误(尾随 预期输出中的空格,以及非默认的 True 布尔值)。如果预期输出无法往返,ruamel.yaml 可能无法转储您的预期输出。

如果您遇到困难,现在可以检查 dout 以确定您的解析输入应该是什么样子。

所以现在试试

def recursive_sort_mappings(s):
    if isinstance(s, list):
        for elem in s:
            recursive_sort_mappings(elem)
        return 
    if not isinstance(s, dict):
        return
    for key in sorted(s, reverse=True):
        value = s.pop(key)
        recursive_sort_mappings(value)
        s.insert(0, key, value)

din = yaml.load(yaml_in)
recursive_sort_mappings(din)
yaml.diff(din, yaml_out)

这给出了相当多的输出,因为 recursive_sort_mappings 不知道 关于所有键的合并和 运行s,尝试将合并键保持在其原始位置,另外在弹出键时(在将其重新插入 第一个位置),如果弹出的值存在于合并的映射中,会做一些魔术:

--- in
+++ out
@@ -1,8 +1,8 @@
 _world:
   anchor_struct: &anchor_struct
-    bar:
+    bar: &id001
       val: "foo"
-    foo:
+    foo: &id002
       val: "foo"
   anchr_val: &anchor_val famous_val
   bool: True
@@ -14,24 +14,38 @@
 elem1:
   <<: *anchor_struct
   anchor_struct:
+    bar: *id001
     <<: *anchor_struct
+    foo: *id002
     other_elem: "other_elem"
     second_elem: "second_elem"
   anchor_val: *anchor_val
+  bar: *id001
+  foo: *id002
   myStruct:
     <<: *anchor_struct
+    bar: *id001
+    foo: *id002
   newmsg:
     <<: *newmsg
+    foo: "foo"
     msg: "msg2"
+    new: "new"
   www: web
   zzz: zorglub
 elem2:
-  <<: *anchor_struct
   anchor_struct:
     <<: *anchor_struct
+    bar: *id001
+    foo: *id002
     other_elem: "other_elem"
   anchor_val: *anchor_val
+  <<: *anchor_struct
+  bar: *id001
+  foo: *id002
   myStruct:
     <<: *anchor_struct
+    bar: *id001
+    foo: *id002
   www: web
   zzz: zorglub

要解决这个问题,您需要做很多事情。首先你需要放弃.insert(),它模拟(为Python3内置OrderedDict)方法定义C ordereddict包ruamel.ordereddict。该仿真重新创建了 OrderedDict 和 这会导致重复。 Python3 C 实现,功能较弱(比 .insert()),但在这种情况下很有用 方法 move_to_end(可用于更新 ruamel.yaml 中的 .insert() 仿真)。

其次,您只需遍历“真实”键,而不是合并提供的那些键,因此您不能使用 for key in

第三,如果它在其他地方,你需要合并键移动到映射的顶部。

(为调试目的添加了 level 参数)

def recursive_sort_mappings(s, level=0):
    if isinstance(s, list): 
        for elem in s:
            recursive_sort_mappings(elem, level=level+1)
        return 
    if not isinstance(s, dict):
        return
    merge = getattr(s, merge_attrib, [None])[0]
    if merge is not None and merge[0] != 0:  # << not in first position, move it
       setattr(s, merge_attrib, [(0, merge[1])])

    for key in sorted(s._ok): # _ok -> set of Own Keys, i.e. not merged in keys
        value = s[key]
        # print('v1', level, key, super(ruamel.yaml.comments.CommentedMap, s).keys())
        recursive_sort_mappings(value, level=level+1)
        # print('v2', level, key, super(ruamel.yaml.comments.CommentedMap, s).keys())
        s.move_to_end(key)


din = yaml.load(yaml_in)
recursive_sort_mappings(din)
assert yaml.diff(din, yaml_out)

然后 diff 不再给出输出。