如何在不破坏 ruamel.yaml 中的锚点的情况下更改序列中的锚定标量?

How to change an anchored scalar in a sequence without destroying the anchor in ruamel.yaml?

在 CentOS 7 上使用 ruamel.yaml 版本 0.15.92 和 Python 3.6.6 时,我似乎无法在不破坏锚点本身或创建的情况下更新序列中锚定标量的值下一次转储中的 YAML 无效。

我试图用新值重新创建原始节点类型(旧 PlainScalarString -> 新 PlainScalarString、旧 FoldedScalarString -> 新 FoldedScalarString 等), 将 anchor 复制到它。虽然这会将锚点恢复为更新后的标量值,但它也会创建无效的 YAML,因为 YAML 文件中的第一个别名稍后会复制相同的锚点名称并为其分配标量的 old 值我正在尝试更新。

然后我尝试用实际的别名文本替换所有受影响的别名——比如 *anchor_name——但这会导致值像 '*anchor_name' 一样被引用,使别名变得无用。

我恢复了它,然后试图抑制重复的锚点名称(通过在每个受影响的别名上设置 always_dump=False)。虽然这确实抑制了重复的锚点名称,但不幸的是它只是转储了锚定标量的 old 值。

我的整个测试数据如下;假设这被命名为 test.yaml:

# Header comment
---
# Post-header comment

# Reusable aliases
aliases:
  - &plain_value This is unencrypted
  - &string_password ENC[PKCS7,MIIBiQYJKoZIhvcNAQcDoIIBejCCAXYCAQAxggEhMIIBHQIBADAFMAACAQEwDQYJKoZIhvcNAQEBBQAEggEAYnFbMveZGBgd9aw7h4VV+M202zRdcP96UQs1q+ViznJK2Ee08hoW9jdIqVhNaecYALUihKjVYijJa649VF7BLZXV0svLEHD8LZeduoLS3iC9uszdhDFB2Q6R/Vv/ARjHNoWc6/D0nFN9vwcrQNITnvREl0WXYpR9SmW0krUpyr90gSAxTxPNJVlEOtA0afeJiXOtQEu/b8n+UDM3eXXRO+2SEXM4ub7fNcj6V9DgT3WwKBUjqzQ5DicnB19FNQ1cBGcmCo8qRv0JtbVqZ4+WJFGc06hOTcAJPsAaWWUn80ChcTnl4ELNzpJFoxAxHgepirskuIvuWZv3h/PL8Ez3NDBMBgkqhkiG9w0BBwEwHQYJYIZIAWUDBAEqBBBSuVIsvWXMmdFJtJmtJxXxgCAGFCioe/zdphGqynmj6vVDnCjA3Xc0VPOCmmCl/cTKdg==]
  - &block_password >
    ENC[PKCS7,MIIBiQYJKoZIhvcNAQcDoIIBejCCAXYCAQAxggEhMIIBHQIBADAFMAACAQEw
    DQYJKoZIhvcNAQEBBQAEggEAojErrxuNcdX6oR+VA/I3PyuV2CwXx166nIUp
    asEHo1/CiCIoE3qCnjK2FJF8vg+l3AqRmdb7vYrqQ+30RFfHSlB9zApSw8NW
    tnEpawX4hhKAxnTc/JKStLLu2k7iZkhkor/UA2HeVJcCzEeYAwuOQRPaolmQ
    TGHjvm2w6lhFDKFkmETD/tq4gQNcOgLmJ+Pqhogr/5FmGOpJ7VGjpeUwLteM
    er3oQozp4l2bUTJ8wk9xY6cN+eeOIcWXCPPdNetoKcVropiwrYH8QV4CZ2Ky
    u0vpiybEuBCKhr1EpfqhrtuG5s817eOb7+Wf5ctR0rPuxlTUqdnDY31zZ3Kb
    mcjqHDBMBgkqhkiG9w0BBwEwHQYJYIZIAWUDBAEqBBBATq6BjaxU2bfcLL5S
    bxzsgCDsWzggzxsCw4Dp0uYLwvMKjJEpMLeFXGrLHJzTF6U2Nw==]

top_key: unencrypted value
top_alias: *plain_value

top::hash:
  ignore: more
  # This pulls its string-form value from above
  stringified_alias: *string_password
  sub:
    ignore: value
    key: unencrypted subbed-value
    # This pulls its block-form value from above
    blocked_alias: *block_password
  sub_more:
    # This is a stringified EYAML value, NOT an alias
    inline_string: ENC[PKCS7,MIIBiQYJKoZIhvcNAQcDoIIBejCCAXYCAQAxggEhMIIBHQIBADAFMAACAQEwDQYJKoZIhvcNAQEBBQAEggEAafmyrrae2kx8HdyPmn/RHQRcTPhqpx5Idm12hCDCIbwVM++H+c620z4EN2wlugz/GcLaiGsybaVWzAZ+3r+1+EwXn5ec4dJ5TTqo7oxThwUMa+SHliipDJwGoGii/H+y2I+3+irhDYmACL2nyJ4dv4IUXwqkv6nh1J9MwcOkGES2SKiDm/WwfkbPIZc3ccp1FI9AX/m3SVqEcvsrAfw6HtkolM22csfuJREHkTp7nBapDvOkWn4plzfOw9VhPKhq1x9DUCVFqqG/HAKv++v4osClK6k1MmSJWaMHrW1z3n7LftV9ZZ60E0Cgro2xSaD+itRwBp07H0GeWuoKB4+44TBMBgkqhkiG9w0BBwEwHQYJYIZIAWUDBAEqBBCRv9r2lvQ1GJMoD064EtdigCCw43EAKZWOc41yEjknjRaWDm1VUug6I90lxCsUrxoaMA==]
    # Also NOT an alias, in block form
    block_string: >
      ENC[PKCS7,MIIBiQYJKoZIhvcNAQcDoIIBejCCAXYCAQAxggEhMIIBHQIBADAFMAACAQEw
      DQYJKoZIhvcNAQEBBQAEggEAafmyrrae2kx8HdyPmn/RHQRcTPhqpx5Idm12
      hCDCIbwVM++H+c620z4EN2wlugz/GcLaiGsybaVWzAZ+3r+1+EwXn5ec4dJ5
      TTqo7oxThwUMa+SHliipDJwGoGii/H+y2I+3+irhDYmACL2nyJ4dv4IUXwqk
      v6nh1J9MwcOkGES2SKiDm/WwfkbPIZc3ccp1FI9AX/m3SVqEcvsrAfw6Htko
      lM22csfuJREHkTp7nBapDvOkWn4plzfOw9VhPKhq1x9DUCVFqqG/HAKv++v4
      osClK6k1MmSJWaMHrW1z3n7LftV9ZZ60E0Cgro2xSaD+itRwBp07H0GeWuoK
      B4+44TBMBgkqhkiG9w0BBwEwHQYJYIZIAWUDBAEqBBCRv9r2lvQ1GJMoD064
      EtdigCCw43EAKZWOc41yEjknjRaWDm1VUug6I90lxCsUrxoaMA==]

# Signature line

此问题有两种形式,因此这里有两个用于重现条件的代码示例:

首先,"How can we most simply update the value of an anchored scalar in a sequence without destroying the anchor or its aliases?"这看起来像:

with open('test.yaml', 'r') as f:
  yaml_data = yaml.load(f)

yaml_data['aliases'][1] = "New string password"
yaml.dump(yaml_data, sys.stdout)

请注意,这会破坏锚点。我非常希望解决方案看起来尽可能类似于第一个片段;也许像 yaml_data['aliases'][1].set_value("New string password") # Changes only the scalar value while preserving the original anchor, comments, position, et al..

Second, "If we must instead wrap the new value in some object to preserve the anchor (and other attributes of the entry being replaced), what is the simplest approach which also preserves all aliases that refer to it (such that they adopt the updated value) when dumped?" 我试图解决这个问题需要相当多的代码,包括递归函数。由于 SO 指南建议不要转储大代码,因此我将提供相关位。请假定未列出的代码运行良好。

### <snip def FindEYAMLPaths(...) returns lists of paths through the YAML to every value starting with 'ENC['>
### <snip def GetYAMLValue(...) returns the node -- as a PlainScalarString, FoldedScalarString, et al. -- identified by a path from FindEYAMLPaths>
### <snip def DisableAnchorDump(...) sets `anchor.always_dump=False` if the node has an anchor attribute>

def ReplaceYAMLValue(value, data, path=None):
  if path is None:
    return

  ref = data
  last_ref = path.pop()
  for p in path:
    ref = ref[p]

  # All I'm trying to do here is change the scalar value without disrupting its comments, anchor, positioning, or any of its aliases.
  # This succeeds in changing the scalar value and preserving its original anchor, but disrupts its aliases which insist on preserving the old value.
  if isinstance(ref[last_ref], PlainScalarString):
    ref[last_ref] = PlainScalarString(value, anchor=ref[last_ref].anchor.value)
  elif isinstance(ref[last_ref], FoldedScalarString):
    ref[last_ref] = FoldedScalarString(value, anchor=ref[last_ref].anchor.value)
  else:
    ref[last_ref] = value


with open('test.yaml', 'r') as f:
  yaml_data = yaml.load(f)

seen_anchors = []
for path in FindEYAMLPaths(yaml_data):
  if path is None:
    continue

  node = GetYAMLValue(yaml_data, deque(path))
  if hasattr(node, 'anchor'):
    test_anchor = node.anchor.value
    if test_anchor is not None:
      if test_anchor in seen_anchors:
        # This is expected to just be an alias, pointing at the newly updated anchor
        DisableAnchorDump(node)
        continue
      seen_anchors.append(test_anchor)

  ReplaceYAMLValue("New string password", yaml_data, path)

yaml.dump(yaml_data, sys.stdout)

请注意,这会生成有效的 YAML,除了所有受影响的别名都消失了,取而代之的是锚定标量的 old 值。

我希望能够在不破坏 YAML 内容的任何其他部分的情况下更改序列中别名标量的值。根据我看到的关于 ruamel.yaml 的其他帖子,我完全接受我可能需要将更新的 YAML 转储到文件并重新加载它以使内存中的别名更新为新值。我只是希望改变:

输入文件

aliases:
  - &some_anchor Old value

usage: *some_anchor

至:

输出文件

aliases:
  - &some_anchor NEW VALUE

usage: *some_anchor

相反,这是上述两个示例的输出:

First,注意原来的anchor被破坏了,top::hash:stringified_alias:的值现在带有原来的anchor和old value 而不是 ['aliases'][1]:

处新更新的标量值的别名
---
# Post-header comment

# Reusable aliases
aliases:
  - &plain_value This is unencrypted
  - New string password
  - &block_password >
    ENC[PKCS7,MIIBiQYJKoZIhvcNAQcDoIIBejCCAXYCAQAxggEhMIIBHQIBADAFMAACAQEw
    DQYJKoZIhvcNAQEBBQAEggEAojErrxuNcdX6oR+VA/I3PyuV2CwXx166nIUp
    asEHo1/CiCIoE3qCnjK2FJF8vg+l3AqRmdb7vYrqQ+30RFfHSlB9zApSw8NW
    tnEpawX4hhKAxnTc/JKStLLu2k7iZkhkor/UA2HeVJcCzEeYAwuOQRPaolmQ
    TGHjvm2w6lhFDKFkmETD/tq4gQNcOgLmJ+Pqhogr/5FmGOpJ7VGjpeUwLteM
    er3oQozp4l2bUTJ8wk9xY6cN+eeOIcWXCPPdNetoKcVropiwrYH8QV4CZ2Ky
    u0vpiybEuBCKhr1EpfqhrtuG5s817eOb7+Wf5ctR0rPuxlTUqdnDY31zZ3Kb
    mcjqHDBMBgkqhkiG9w0BBwEwHQYJYIZIAWUDBAEqBBBATq6BjaxU2bfcLL5S
    bxzsgCDsWzggzxsCw4Dp0uYLwvMKjJEpMLeFXGrLHJzTF6U2Nw==]

# ... snip ...

top::hash:
  ignore: more
  # This pulls its string-form value from above
  stringified_alias: &string_password ENC[PKCS7,MIIBiQYJKoZIhvcNAQcDoIIBejCCAXYCAQAxggEhMIIBHQIBADAFMAACAQEwDQYJKoZIhvcNAQEBBQAEggEAYnFbMveZGBgd9aw7h4VV+M202zRdcP96UQs1q+ViznJK2Ee08hoW9jdIqVhNaecYALUihKjVYijJa649VF7BLZXV0svLEHD8LZeduoLS3iC9uszdhDFB2Q6R/Vv/ARjHNoWc6/D0nFN9vwcrQNITnvREl0WXYpR9SmW0krUpyr90gSAxTxPNJVlEOtA0afeJiXOtQEu/b8n+UDM3eXXRO+2SEXM4ub7fNcj6V9DgT3WwKBUjqzQ5DicnB19FNQ1cBGcmCo8qRv0JtbVqZ4+WJFGc06hOTcAJPsAaWWUn80ChcTnl4ELNzpJFoxAxHgepirskuIvuWZv3h/PL8Ez3NDBMBgkqhkiG9w0BBwEwHQYJYIZIAWUDBAEqBBBSuVIsvWXMmdFJtJmtJxXxgCAGFCioe/zdphGqynmj6vVDnCjA3Xc0VPOCmmCl/cTKdg==]

# ... snip ...

Second,请注意 ['aliases'][1] 现在看起来是正确的——它是原始锚点的新值——但我希望看到它的别名,我看到的是 旧值 。我希望看到 *string_password 而不是 ENC[...]

---
# Post-header comment

# Reusable aliases
aliases:
  - &plain_value This is unencrypted
  - &string_password New string password
  - &block_password >-
    New string password

# ... snip ...

top::hash:
  ignore: more
  # This pulls its string-form value from above
  stringified_alias: ENC[PKCS7,MIIBiQYJKoZIhvcNAQcDoIIBejCCAXYCAQAxggEhMIIBHQIBADAFMAACAQEwDQYJKoZIhvcNAQEBBQAEggEAYnFbMveZGBgd9aw7h4VV+M202zRdcP96UQs1q+ViznJK2Ee08hoW9jdIqVhNaecYALUihKjVYijJa649VF7BLZXV0svLEHD8LZeduoLS3iC9uszdhDFB2Q6R/Vv/ARjHNoWc6/D0nFN9vwcrQNITnvREl0WXYpR9SmW0krUpyr90gSAxTxPNJVlEOtA0afeJiXOtQEu/b8n+UDM3eXXRO+2SEXM4ub7fNcj6V9DgT3WwKBUjqzQ5DicnB19FNQ1cBGcmCo8qRv0JtbVqZ4+WJFGc06hOTcAJPsAaWWUn80ChcTnl4ELNzpJFoxAxHgepirskuIvuWZv3h/PL8Ez3NDBMBgkqhkiG9w0BBwEwHQYJYIZIAWUDBAEqBBBSuVIsvWXMmdFJtJmtJxXxgCAGFCioe/zdphGqynmj6vVDnCjA3Xc0VPOCmmCl/cTKdg==]

# ... snip ...

如果您读入锚定标量,例如 This is unencrypted, 使用 ruamel.yaml,你会得到一个 PlainScalarString 对象(或另一个 ScalarString sub类),这是围绕基本字符串的一个极薄的层 类型。如果适用,该层具有存储锚点的属性(其他用途主要是 维护 quoting/literal/folding 样式信息)。并且使用该锚点的任何别名都引用相同的 ScalarString 实例。

转储锚点属性时不用于创建别名,即 是通过对同一个文件有多个引用以正常方式完成的 目的。该属性仅用于写入锚点id,也 如果有一个属性但没有进一步的引用(即没有别名的锚点),则这样做。

因此,如果将这样的对象替换为 多个引用(在锚点或任何别名 斑点)参考消失。如果你也强制执行相同的操作 其他对象上的锚点名称,你会得到重复的锚点,相反 对于正常的 anchor/alias 代,没有进行检查 "forced" 主播。

由于 ScalarString 是一个很薄的包装器,它们本质上是 不可变对象,就像字符串本身一样。与别名不同 字典和列表,它们是可以清空的集合对象,并且 然后填充(而不是用新实例替换),你不能做 string.

ScalarString 的实现当然可以改变,所以你 可以使用您的 set_values() 方法,但涉及创建替代方法 类 对于所有对象 (PlainScalarString, FoldedScalarString)。你必须确保 这些被用于构建和表示,然后 preferred 也像普通字符串一样,只要你需要它,所以 至少你可以打印。 这相对容易做到,但需要复制并稍微修改几个 几十行代码

我认为保留 ScalarStrings 原样更容易(即 是不可变的),如果你想改变所有,就做你需要做的 出现(即引用):更新所有引用 原来的。如果您的数据结构包含数百万个节点 可能会非常耗时,但仍然会是什么的一部分 加载和转储 YAML 本身需要:

import sys
from pathlib import Path
import ruamel.yaml

in_file = Path('test.yaml')

def update_aliased_scalar(data, obj, val):
    def recurse(d, ref, nv):
        if isinstance(d, dict):
            for i, k in [(idx, key) for idx, key in enumerate(d.keys()) if key is ref]:
                d.insert(i, nv, d.pop(k))
            for k, v in d.non_merged_items():
                if v is ref:
                    d[k] = nv
                else:
                    recurse(v, ref, nv)
        elif isinstance(d, list):
            for idx, item in enumerate(d):
                if item is ref:
                    d[idx] = nv
                else:
                    recurse(item, ref, nv)

    if hasattr(obj, 'anchor'):
        recurse(data, obj, type(obj)(val, anchor=obj.anchor.value))
    else:
        recurse(data, obj, type(obj)(val))

yaml = ruamel.yaml.YAML()
yaml.indent(mapping=2, sequence=4, offset=2)
yaml.preserve_quotes = True
data = yaml.load(in_file)

update_aliased_scalar(data, data['aliases'][1], "New string password")
update_aliased_scalar(data, data['top::hash']['sub']['blocked_alias'], "New block password\n")

yaml.dump(data, sys.stdout)

给出:

# Post-header comment

# Reusable aliases
aliases:
  - &plain_value This is unencrypted
  - &string_password New string password
  - &block_password >
    New block password

top_key: unencrypted value
top_alias: *plain_value

top::hash:
  ignore: more
  # This pulls its string-form value from above
  stringified_alias: *string_password
  sub:
    ignore: value
    key: unencrypted subbed-value
    # This pulls its block-form value from above
    blocked_alias: *block_password
  sub_more:
    # This is a stringified EYAML value, NOT an alias
    inline_string: ENC[PKCS7,MIIBiQYJKoZIhvcNAQcDoIIBejCCAXYCAQAxggEhMIIBHQIBADAFMAACAQEwDQYJKoZIhvcNAQEBBQAEggEAafmyrrae2kx8HdyPmn/RHQRcTPhqpx5Idm12hCDCIbwVM++H+c620z4EN2wlugz/GcLaiGsybaVWzAZ+3r+1+EwXn5ec4dJ5TTqo7oxThwUMa+SHliipDJwGoGii/H+y2I+3+irhDYmACL2nyJ4dv4IUXwqkv6nh1J9MwcOkGES2SKiDm/WwfkbPIZc3ccp1FI9AX/m3SVqEcvsrAfw6HtkolM22csfuJREHkTp7nBapDvOkWn4plzfOw9VhPKhq1x9DUCVFqqG/HAKv++v4osClK6k1MmSJWaMHrW1z3n7LftV9ZZ60E0Cgro2xSaD+itRwBp07H0GeWuoKB4+44TBMBgkqhkiG9w0BBwEwHQYJYIZIAWUDBAEqBBCRv9r2lvQ1GJMoD064EtdigCCw43EAKZWOc41yEjknjRaWDm1VUug6I90lxCsUrxoaMA==]
    # Also NOT an alias, in block form
    block_string: >
      ENC[PKCS7,MIIBiQYJKoZIhvcNAQcDoIIBejCCAXYCAQAxggEhMIIBHQIBADAFMAACAQEw
      DQYJKoZIhvcNAQEBBQAEggEAafmyrrae2kx8HdyPmn/RHQRcTPhqpx5Idm12
      hCDCIbwVM++H+c620z4EN2wlugz/GcLaiGsybaVWzAZ+3r+1+EwXn5ec4dJ5
      TTqo7oxThwUMa+SHliipDJwGoGii/H+y2I+3+irhDYmACL2nyJ4dv4IUXwqk
      v6nh1J9MwcOkGES2SKiDm/WwfkbPIZc3ccp1FI9AX/m3SVqEcvsrAfw6Htko
      lM22csfuJREHkTp7nBapDvOkWn4plzfOw9VhPKhq1x9DUCVFqqG/HAKv++v4
      osClK6k1MmSJWaMHrW1z3n7LftV9ZZ60E0Cgro2xSaD+itRwBp07H0GeWuoK
      B4+44TBMBgkqhkiG9w0BBwEwHQYJYIZIAWUDBAEqBBCRv9r2lvQ1GJMoD064
      EtdigCCw43EAKZWOc41yEjknjRaWDm1VUug6I90lxCsUrxoaMA==]

# Signature line

如您所见,锚点已保留,update_aliased_scalar 如果您 提供锚定 "place" 或别名之一作为参考。

上面的 recurse 也处理别名的键,因为 YAML 映射中的键具有锚点或别名是完全没问题的。您甚至可以拥有一个锚定键,其值是相应键的别名。

如果支持 in-place 修改类型为 ScalarFloat/ScalarInt 等的现有锚定字段,那就太好了。YAML 通常用于配置文件。我遇到的一个常见用例是从一个非常大的模板配置文件创建多个配置文件,只对新文件进行少量更改。我会将模板文件加载到 CommentedMap 中,就地修改一小组键并将其转储回新的 yaml 配置文件中。如果要更改的键未锚定,则此流程非常有效。当它们被锚定时,锚点将在 OP 报告的新文件中复制并使它们无效。手动寻址 post-processing 中的每个锚定键可能会令人望而生畏,因为它们数量众多。