从文件中加载特定的 PyYAML 文档

Question

我有一个 .yml 文件，我正在尝试从中加载某些文档。我知道：

print yaml.load(open('doc_to_open.yml', 'r+'))

将打开 .yml 文件中的第一个（或唯一一个）文档，并且：

for x in yaml.load_all(open('doc_to_open.yml', 'r+')):
    print x

这将打印文件中的所有 YAML 文档。但是说我只想打开文件中的前三个文件，或者想打开文件中的第8个文件。我该怎么做？

Answer 1

如果您根本不想解析前七个 YAML 文件，例如出于效率原因，您必须自己搜索第 8 个文档。

有可能挂接到解析器的第一阶段并计算流中 DocumentStartTokens() 的数量，并且仅在 8 日之后开始传递令牌，并在 9 日停止传递，但这样做绝非易事。这至少仍然会标记所有前面的文档。

完全低效的方法是使用 .load_all() 和 select 适当的文档，完成后 tokenizing/parsing/composing/resolving 所有文件 ¹:

import sys
import ruamel.yaml

yaml = ruamel.yaml.YAML()
for idx, data in enumerate(yaml.load_all(open('input.yaml'):
    if idx == 7:
        yaml.dump(data, sys.stdout)

如果你运行上面的文件input.yaml:

---
document: 0
---
document: 1
---
document: 2
---
document: 3
---
document: 4
---
document: 5
---
document: 6
---
document: 7   # < the 8th document
---
document: 8
---
document: 9
...

你得到输出：

document: 7   # < the 8th document

不幸的是，您不能天真地计算文档 markers (---) 的数量，因为文档不必以一个开头：

document: 0
---
document: 1
.
.

如果文件以 directive 开头，也不必在第一行有标记 ²:

%YAML 1.2
---
document: 0
---
document: 1
.
.

或以仅包含评论的 "document" 开头：

# the 8th document is the interesting one
---
document: 0
---
document: 1
.
.

考虑到您可以使用的所有内容：

def get_nth_yaml_doc(stream, doc_nr):
    doc_idx = 0
    data = []
    for line in stream:
        if line == u'---\n' or line.startswith('--- '):
            doc_idx += 1
            continue
        if line == '...\n':
            break
        if doc_nr < doc_idx:
            break
        if line.startswith(u'%'):
            continue
        if doc_idx == 0:  # no initial '---' YAML files don't start with
            if line.lstrip().startswith('#'):
                continue
            doc_idx = 1
        if doc_idx == doc_nr:
            data.append(line)
    return yaml.load(''.join(data))

with open("input.yaml") as fp:
    data = get_nth_yaml_doc(fp, 8)
yaml.dump(data, sys.stdout)

并得到：

document: 7   # < the 8th document

在上述所有情况下，都非常有效，甚至无需对前面的 YAML 文档（或以下）进行标记。

还有一个额外的警告，即 YAML 文件可以以 byte-order-marker, and that the individual documents within a stream 开头，也可以以这些标记开头。上面的例程不处理那个。

¹ _{这是使用 ruamel.yaml 完成的，我是作者，它是 PyYAML 的增强版本。 AFAIK PyYAML 的工作方式相同（但例如会删除往返评论）。}
² _{从技术上讲，该指令在它自己的 directives document 中，因此您应该将其视为文档，但 .load_all() 不会返回该文档，因此我不将其视为这样.}

从文件中加载特定的 PyYAML 文档

Load specific PyYAML documents from file

python

yaml

pyyaml

python-2.7