如何格式化 YAML 转储中的字符串？

Question

使用 ruamel.yaml 转储多行字符串结果如下：

address_pattern_template: "\n^                           #the beginning of the address\
  \ string (e.g. interface number)\n(?P<junkbefore>             #capturing the junk\
  \ before the address\n    \D?                     #an optional non-digit character\n\
  \    .*?                     #any characters (non-greedy) up to the address\n)\n\
  (?P<address>                #capturing the pure address\n    {pure_address_pattern}\n\
  )\n(?P<junkafter>              #capturing the junk after the address\n    \D? \
  \                    #an optional non-digit character\n    .*                  \
  \    #any characters (greedy) up to the end of the string\n)\n$                \
  \           #the end of the input address string\n"

代码是这样的：

from ruamel.yaml import YAML
data =dict(
address_pattern_template="""
^                           #the beginning of the address string (e.g. interface number)
(?P<junkbefore>             #capturing the junk before the address
    \D?                     #an optional non-digit character
    .*?                     #any characters (non-greedy) up to the address
)
(?P<address>                #capturing the pure address
    {pure_address_pattern}
)
(?P<junkafter>              #capturing the junk after the address
    \D?                     #an optional non-digit character
    .*                      #any characters (greedy) up to the end of the string
)
$                           #the end of the input address string
"""
)
yaml = YAML(typ='safe', pure=True)
yaml.default_flow_style = False
with open('D:\datadump.yml', 'w') as dumpfile:
    yaml.dump(data, dumpfile)

我想以可读的格式查看多行字符串。 IE。换行符将换行而不是显示为“\n”。

我可以设置什么flags/options让它看起来像这样：

address_pattern_template: |
  ^                           #the beginning of the address string (e.g. interface number)
  (?P<junkbefore>             #capturing the junk before the address
      \D?                     #an optional non-digit character
      .*?                     #any characters (non-greedy) up to the address
  )
  (?P<address>                #capturing the pure address
      {pure_address_pattern}
  )
  (?P<junkafter>              #capturing the junk after the address
      \D?                     #an optional non-digit character
      .*                      #any characters (greedy) up to the end of the string
  )
  $                           #the end of the input address string

注意，我的程序记录了一个大字典，这样的多行字符串可以出现在字典结构的任何地方和任何深处。因此，遍历 dict 树并在转储之前加载它们中的每一个（如 "Can I control the formatting of multiline strings?" 中所建议）对我来说不是一个好的解决方案。

我想知道转储程序是否可以用一个参数来识别多行字符串并以块格式转储它们。单行字符串仍然可以与冒号在同一行。这使日志文件最易读。

Answer 1

首先，你呈现的就是你希望得到的输出，不代表您提供的数据。自从该数据中的多行字符串以换行符开头，块样式文字标量需要块缩进指示符和开头的换行符：

address_pattern_template: |2

  ^                           #the beginning of the address string (e.g. interface number)
  .
  .
  .

但是（至少对我而言）这些模式没有意义从一个换行符开始，所以我将在下面省略它。

如果您不知道多行字符串在您的数据结构中的什么位置，但如果您知道在转储之前就地转换它，而不是你可以使用 ruamel.yaml.scalarstring:walk_tree

import sys
import ruamel.yaml

data = dict(a=[1, 2, 3, dict(
address_pattern_template="""\
^                           #the beginning of the address string (e.g. interface number)
(?P<junkbefore>             #capturing the junk before the address
    \D?                     #an optional non-digit character
    .*?                     #any characters (non-greedy) up to the address
)
(?P<address>                #capturing the pure address
    {pure_address_pattern}
)
(?P<junkafter>              #capturing the junk after the address
    \D?                     #an optional non-digit character
    .*                      #any characters (greedy) up to the end of the string
)
$                           #the end of the input address string
"""
)])


yaml = ruamel.yaml.YAML()
ruamel.yaml.scalarstring.walk_tree(data)
yaml.dump(data, sys.stdout)

给出：

a:
- 1
- 2
- 3
- address_pattern_template: |
    ^                           #the beginning of the address string (e.g. interface number)
    (?P<junkbefore>             #capturing the junk before the address
        \D?                     #an optional non-digit character
        .*?                     #any characters (non-greedy) up to the address
    )
    (?P<address>                #capturing the pure address
        {pure_address_pattern}
    )
    (?P<junkafter>              #capturing the junk after the address
        \D?                     #an optional non-digit character
        .*                      #any characters (greedy) up to the end of the string
    )
    $                           #the end of the input address string

walk_tree 将多行字符串替换为 LiteralScalarString，在大多数情况下表现得像普通字符串.

如果就地转换不可接受，您可以对数据，然后在副本上应用 walk_tree。如果那不是可以接受的由于内存限制，您必须为字符串提供替代表示如果您有多行字符串，则在表示期间检查。最好你这样做在 Representer 的子类中：

import sys
import ruamel.yaml

# data defined as before

class MyRepresenter(ruamel.yaml.representer.RoundTripRepresenter):
    def represent_str(self, data):
        style = '|' if '\n' in data else None
        return self.represent_scalar(u'tag:yaml.org,2002:str', data, style=style)


MyRepresenter.add_representer(str, MyRepresenter.represent_str)

yaml = ruamel.yaml.YAML()
yaml.Representer = MyRepresenter
yaml.dump(data, sys.stdout)

给出与前面示例相同的输出。

如何格式化 YAML 转储中的字符串？

How to format a string in YAML dump?

string

formatting

dump

ruamel.yaml