如何解析包含多个文档的 YAML 文件?
How to parse a YAML file with multiple documents?
这是我的解析代码:
import yaml
def yaml_as_python(val):
"""Convert YAML to dict"""
try:
return yaml.load_all(val)
except yaml.YAMLError as exc:
return exc
with open('circuits-small.yaml','r') as input_file:
results = yaml_as_python(input_file)
print results
for value in results:
print value
这是文件示例:
ingests:
- timestamp: 1970-01-01T00:00:00.000Z
id: SwitchBank_35496721
attrs:
Feeder: Line_928
Switch.normalOpen: 'true'
IdentifiedObject.description: SwitchBank
IdentifiedObject.mRID: SwitchBank_35496721
PowerSystemResource.circuit: '928'
IdentifiedObject.name: SwitchBank_35496721
IdentifiedObject.aliasName: SwitchBank_35496721
loc: vector [43.05292, -76.126800000000003, 0.0]
kind: SwitchBank
- timestamp: 1970-01-01T00:00:00.000Z
id: UndergroundDistributionLineSegment_34862802
attrs:
Feeder: Line_928
status: de-energized
IdentifiedObject.description: UndergroundDistributionLineSegment
IdentifiedObject.mRID: UndergroundDistributionLineSegment_34862802
PowerSystemResource.circuit: '928'
IdentifiedObject.name: UndergroundDistributionLineSegment_34862802
path:
- vector [43.052942000000002, -76.126716000000002, 0.0]
- vector [43.052585000000001, -76.126515999999995, 0.0]
kind: UndergroundDistributionLineSegment
- timestamp: 1970-01-01T00:00:00.000Z
id: UndergroundDistributionLineSegment_34806014
attrs:
Feeder: Line_928
status: de-energized
IdentifiedObject.description: UndergroundDistributionLineSegment
IdentifiedObject.mRID: UndergroundDistributionLineSegment_34806014
PowerSystemResource.circuit: '928'
IdentifiedObject.name: UndergroundDistributionLineSegment_34806014
path:
- vector [43.05292, -76.126800000000003, 0.0]
- vector [43.052928999999999, -76.126766000000003, 0.0]
- vector [43.052942000000002, -76.126716000000002, 0.0]
kind: UndergroundDistributionLineSegment
...
ingests:
- timestamp: 1970-01-01T00:00:00.000Z
id: OverheadDistributionLineSegment_31168454
在回溯中,请注意它开始出现问题...
Traceback (most recent call last):
File "convert.py", line 29, in <module>
for value in results:
File "/Users/conduce-laptop/anaconda2/lib/python2.7/site-packages/yaml/__init__.py", line 82, in load_all
while loader.check_data():
File "/Users/conduce-laptop/anaconda2/lib/python2.7/site-packages/yaml/constructor.py", line 28, in check_data
return self.check_node()
File "/Users/conduce-laptop/anaconda2/lib/python2.7/site-packages/yaml/composer.py", line 18, in check_node
if self.check_event(StreamStartEvent):
File "/Users/conduce-laptop/anaconda2/lib/python2.7/site-packages/yaml/parser.py", line 98, in check_event
self.current_event = self.state()
File "/Users/conduce-laptop/anaconda2/lib/python2.7/site-packages/yaml/parser.py", line 174, in parse_document_start
self.peek_token().start_mark)
yaml.parser.ParserError: expected '<document start>', but found '<block mapping start>'
in "circuits-small.yaml", line 42, column 1
我想要的是将这些文档中的每一个解析为一个单独的对象,也许它们都在同一个列表中,或者几乎任何其他可以与 PyYAML 模块一起使用的对象。我相信 ...
实际上是有效的 YAML,所以我很惊讶它不会自动处理它。
我认为你的 yaml 无效
查看示例中的第二个文档,它以 ... 而不是 ---
开头
...
ingests:
- timestamp: 1970-01-01T00:00:00.000Z
id: OverheadDistributionLineSegment_31168454
错误消息非常具体,文档需要以 document start marker 开头。您的第一个文档没有这样的标记,尽管它有一个文档结束标记。在你用 ...
明确结束第一个文档后,你不能再在 PyYAML 中使用没有文档边界标记的文档,你必须明确地用 ---
:
开始它
文件结尾应如下所示:
kind: UndergroundDistributionLineSegment
...
---
ingests:
- timestamp: 1970-01-01T00:00:00.000Z
id: OverheadDistributionLineSegment_31168454
您可以从第一个文档中省略明确的文档开始标记,但您需要为每个后续文档包含一个开始标记。文档结束标记是可选的。
如果您不能完全控制输入,使用 .load_all()
是不安全的。通常没有理由冒这个风险,您应该使用 .safe_load_all()
并扩展 SafeLoader
来处理您的 YAML 可能包含的任何特定标签。
除此之外,您应该在文档开始指示符(您还应该将其添加到第一个文档)之前使用明确的 version directive 开始您的 YAML 文档:
%YAML 1.1
---
这是为了您的 YAML 文件的未来编辑者的利益,因为您使用的是 PyYAML,它仅支持(大部分)YAML 1.1,而不支持 YAML 1.2 规范(2009 年表格)。另一种方法当然是将您的 YAML 解析器升级到例如 ruamel.yaml, which would also have warned you about your use of the unsafe load_all()
(disclaimer: I am the author of that parser). ruamel.yaml
doesn't allow you to have a bare document after an explicit end-of-document marker (which is allowed as @flyx pointed out), which is a bug.
这是我的解析代码:
import yaml
def yaml_as_python(val):
"""Convert YAML to dict"""
try:
return yaml.load_all(val)
except yaml.YAMLError as exc:
return exc
with open('circuits-small.yaml','r') as input_file:
results = yaml_as_python(input_file)
print results
for value in results:
print value
这是文件示例:
ingests:
- timestamp: 1970-01-01T00:00:00.000Z
id: SwitchBank_35496721
attrs:
Feeder: Line_928
Switch.normalOpen: 'true'
IdentifiedObject.description: SwitchBank
IdentifiedObject.mRID: SwitchBank_35496721
PowerSystemResource.circuit: '928'
IdentifiedObject.name: SwitchBank_35496721
IdentifiedObject.aliasName: SwitchBank_35496721
loc: vector [43.05292, -76.126800000000003, 0.0]
kind: SwitchBank
- timestamp: 1970-01-01T00:00:00.000Z
id: UndergroundDistributionLineSegment_34862802
attrs:
Feeder: Line_928
status: de-energized
IdentifiedObject.description: UndergroundDistributionLineSegment
IdentifiedObject.mRID: UndergroundDistributionLineSegment_34862802
PowerSystemResource.circuit: '928'
IdentifiedObject.name: UndergroundDistributionLineSegment_34862802
path:
- vector [43.052942000000002, -76.126716000000002, 0.0]
- vector [43.052585000000001, -76.126515999999995, 0.0]
kind: UndergroundDistributionLineSegment
- timestamp: 1970-01-01T00:00:00.000Z
id: UndergroundDistributionLineSegment_34806014
attrs:
Feeder: Line_928
status: de-energized
IdentifiedObject.description: UndergroundDistributionLineSegment
IdentifiedObject.mRID: UndergroundDistributionLineSegment_34806014
PowerSystemResource.circuit: '928'
IdentifiedObject.name: UndergroundDistributionLineSegment_34806014
path:
- vector [43.05292, -76.126800000000003, 0.0]
- vector [43.052928999999999, -76.126766000000003, 0.0]
- vector [43.052942000000002, -76.126716000000002, 0.0]
kind: UndergroundDistributionLineSegment
...
ingests:
- timestamp: 1970-01-01T00:00:00.000Z
id: OverheadDistributionLineSegment_31168454
在回溯中,请注意它开始出现问题...
Traceback (most recent call last):
File "convert.py", line 29, in <module>
for value in results:
File "/Users/conduce-laptop/anaconda2/lib/python2.7/site-packages/yaml/__init__.py", line 82, in load_all
while loader.check_data():
File "/Users/conduce-laptop/anaconda2/lib/python2.7/site-packages/yaml/constructor.py", line 28, in check_data
return self.check_node()
File "/Users/conduce-laptop/anaconda2/lib/python2.7/site-packages/yaml/composer.py", line 18, in check_node
if self.check_event(StreamStartEvent):
File "/Users/conduce-laptop/anaconda2/lib/python2.7/site-packages/yaml/parser.py", line 98, in check_event
self.current_event = self.state()
File "/Users/conduce-laptop/anaconda2/lib/python2.7/site-packages/yaml/parser.py", line 174, in parse_document_start
self.peek_token().start_mark)
yaml.parser.ParserError: expected '<document start>', but found '<block mapping start>'
in "circuits-small.yaml", line 42, column 1
我想要的是将这些文档中的每一个解析为一个单独的对象,也许它们都在同一个列表中,或者几乎任何其他可以与 PyYAML 模块一起使用的对象。我相信 ...
实际上是有效的 YAML,所以我很惊讶它不会自动处理它。
我认为你的 yaml 无效
查看示例中的第二个文档,它以 ... 而不是 ---
开头...
ingests:
- timestamp: 1970-01-01T00:00:00.000Z
id: OverheadDistributionLineSegment_31168454
错误消息非常具体,文档需要以 document start marker 开头。您的第一个文档没有这样的标记,尽管它有一个文档结束标记。在你用 ...
明确结束第一个文档后,你不能再在 PyYAML 中使用没有文档边界标记的文档,你必须明确地用 ---
:
文件结尾应如下所示:
kind: UndergroundDistributionLineSegment
...
---
ingests:
- timestamp: 1970-01-01T00:00:00.000Z
id: OverheadDistributionLineSegment_31168454
您可以从第一个文档中省略明确的文档开始标记,但您需要为每个后续文档包含一个开始标记。文档结束标记是可选的。
如果您不能完全控制输入,使用 .load_all()
是不安全的。通常没有理由冒这个风险,您应该使用 .safe_load_all()
并扩展 SafeLoader
来处理您的 YAML 可能包含的任何特定标签。
除此之外,您应该在文档开始指示符(您还应该将其添加到第一个文档)之前使用明确的 version directive 开始您的 YAML 文档:
%YAML 1.1
---
这是为了您的 YAML 文件的未来编辑者的利益,因为您使用的是 PyYAML,它仅支持(大部分)YAML 1.1,而不支持 YAML 1.2 规范(2009 年表格)。另一种方法当然是将您的 YAML 解析器升级到例如 ruamel.yaml, which would also have warned you about your use of the unsafe load_all()
(disclaimer: I am the author of that parser). ruamel.yaml
doesn't allow you to have a bare document after an explicit end-of-document marker (which is allowed as @flyx pointed out), which is a bug.