删除所有空行但不删除 Ruamel.yaml 中的注释
Removing all blank lines but not comments in Ruamel.yaml
我正在使用 Ruamel Python 库以编程方式更新人工编辑的 YAML 文件。
我有这样的数据:
---
a:
b: '1'
c: "2"
d: 3
# Comment.
e: 4
我事先不知道注释在哪里,空白行在哪里。
我需要将其重置为:
---
a:
b: '1'
c: "2"
d: 3
# Comment.
e: 4
我可以从 中了解到如何简单地删除所有评论,但我不知道如何查看 CommentToken 内部以查看它是否包含我需要保留的评论。
ruamel.yaml 的早期版本不会保留空行,但是通过在所有注释发出时通过的位置去除换行符,可以相对容易地恢复该行为:Emitter.write_comment()
在 ruamel/yaml/emitter.py
中。幸运的是,由空格和换行符组成的行已经减少为换行符。本质上,不是在数据中搜索附加评论并弄清楚如何重写它们,而是让评论来找你。
我添加了一些空的注释行案例来测试功能:
import sys
import ruamel.yaml
yaml_str = """\
---
a:
b: '1'
# comment followed by empty lines
c: "2"
d: 3
# Comment.
e: 4
# empty lines followed by comment
f: 5
# comment between empty lines
g: |+
an empty line within a multi-line literal
with a trailing empty line that is not stripped
h: 6
# final top level comment
"""
# rename the comment writer
ruamel.yaml.emitter.Emitter.write_comment_org = ruamel.yaml.emitter.Emitter.write_comment
# define your own comment writer that calls the orginal if the comment is not empty
def strip_empty_lines_write_comment(self, comment):
# print('{:02d} {:02d} {!r}'.format(self.column, comment.start_mark.column, comment.value))
comment.value = comment.value.replace('\n', '')
if comment.value:
self.write_comment_org(comment)
# install
ruamel.yaml.emitter.Emitter.write_comment = strip_empty_lines_write_comment
data = ruamel.yaml.round_trip_load(yaml_str, preserve_quotes=True)
ruamel.yaml.round_trip_dump(data, sys.stdout)
给出:
a:
b: '1'
# comment followed by empty lines
c: "2"
d: 3
# Comment.
e: 4
# empty lines followed by comment
f: 5
# comment between empty lines
g: |+
an empty line within a multi-line literal
with a trailing empty line that is not stripped
h: 6
# final top level comment
这当然会影响"installing" strip_empty_lines_write_comment
之后的所有转储数据。如果您的程序中还需要转储数据 with 空行,那么您需要基于 Emitter
子类化 StrippingEmitter
并制作一个 StrippingRoundTripDumper
(如 ruamel/yaml/dumper.py
中的 RoundTripDumper
)使用该子类。
(当然可以去掉代码中注释掉的调试打印语句)
它并没有像我问的那样具体解决问题,但对于它的价值,我最终得到了这个:
data = ruamel.yaml.round_trip_load(yaml_str, preserve_quotes=True)
space, no_space = map(lambda x:
[None, None, ruamel.yaml.tokens.CommentToken(x, \
ruamel.yaml.error.CommentMark(0), None), None], ['\n\n', '\n'])
for key in data['a'].ca.items:
data['a'].ca.items[key] = no_space
last = data['a'].keys()[-1]
data['a'].ca.items[last] = space
即我只是暂时放弃保留任何非 space 评论。
FWIW,我最终得到了如下内容。这将删除任何在剥离时为空的注释,即只有空格。包含任何实际内容的评论将被保留。
import ruamel.yaml
def monkeypatch_emitter():
ruamel.yaml.emitter.Emitter.old_write_comment = ruamel.yaml.emitter.Emitter.write_comment
def write_comment(self, comment, *args, **kwargs):
if comment.value.strip():
self.old_write_comment(comment, *args, **kwargs)
ruamel.yaml.emitter.Emitter.write_comment = write_comment
def main():
yaml = ruamel.yaml.YAML()
# do yaml stuff here
if __name__ == '__main__:
monkeypatch_emitter()
main()
自添加上述答案以来,write_comment
签名可能发生了一些变化,因为偶尔出现 'pre' 关键字参数导致签名失败。如果将来再次更改,上面的代码应该传递任何额外的参数。
我正在使用 Ruamel Python 库以编程方式更新人工编辑的 YAML 文件。
我有这样的数据:
---
a:
b: '1'
c: "2"
d: 3
# Comment.
e: 4
我事先不知道注释在哪里,空白行在哪里。
我需要将其重置为:
---
a:
b: '1'
c: "2"
d: 3
# Comment.
e: 4
我可以从
ruamel.yaml 的早期版本不会保留空行,但是通过在所有注释发出时通过的位置去除换行符,可以相对容易地恢复该行为:Emitter.write_comment()
在 ruamel/yaml/emitter.py
中。幸运的是,由空格和换行符组成的行已经减少为换行符。本质上,不是在数据中搜索附加评论并弄清楚如何重写它们,而是让评论来找你。
我添加了一些空的注释行案例来测试功能:
import sys
import ruamel.yaml
yaml_str = """\
---
a:
b: '1'
# comment followed by empty lines
c: "2"
d: 3
# Comment.
e: 4
# empty lines followed by comment
f: 5
# comment between empty lines
g: |+
an empty line within a multi-line literal
with a trailing empty line that is not stripped
h: 6
# final top level comment
"""
# rename the comment writer
ruamel.yaml.emitter.Emitter.write_comment_org = ruamel.yaml.emitter.Emitter.write_comment
# define your own comment writer that calls the orginal if the comment is not empty
def strip_empty_lines_write_comment(self, comment):
# print('{:02d} {:02d} {!r}'.format(self.column, comment.start_mark.column, comment.value))
comment.value = comment.value.replace('\n', '')
if comment.value:
self.write_comment_org(comment)
# install
ruamel.yaml.emitter.Emitter.write_comment = strip_empty_lines_write_comment
data = ruamel.yaml.round_trip_load(yaml_str, preserve_quotes=True)
ruamel.yaml.round_trip_dump(data, sys.stdout)
给出:
a:
b: '1'
# comment followed by empty lines
c: "2"
d: 3
# Comment.
e: 4
# empty lines followed by comment
f: 5
# comment between empty lines
g: |+
an empty line within a multi-line literal
with a trailing empty line that is not stripped
h: 6
# final top level comment
这当然会影响"installing" strip_empty_lines_write_comment
之后的所有转储数据。如果您的程序中还需要转储数据 with 空行,那么您需要基于 Emitter
子类化 StrippingEmitter
并制作一个 StrippingRoundTripDumper
(如 ruamel/yaml/dumper.py
中的 RoundTripDumper
)使用该子类。
(当然可以去掉代码中注释掉的调试打印语句)
它并没有像我问的那样具体解决问题,但对于它的价值,我最终得到了这个:
data = ruamel.yaml.round_trip_load(yaml_str, preserve_quotes=True)
space, no_space = map(lambda x:
[None, None, ruamel.yaml.tokens.CommentToken(x, \
ruamel.yaml.error.CommentMark(0), None), None], ['\n\n', '\n'])
for key in data['a'].ca.items:
data['a'].ca.items[key] = no_space
last = data['a'].keys()[-1]
data['a'].ca.items[last] = space
即我只是暂时放弃保留任何非 space 评论。
FWIW,我最终得到了如下内容。这将删除任何在剥离时为空的注释,即只有空格。包含任何实际内容的评论将被保留。
import ruamel.yaml
def monkeypatch_emitter():
ruamel.yaml.emitter.Emitter.old_write_comment = ruamel.yaml.emitter.Emitter.write_comment
def write_comment(self, comment, *args, **kwargs):
if comment.value.strip():
self.old_write_comment(comment, *args, **kwargs)
ruamel.yaml.emitter.Emitter.write_comment = write_comment
def main():
yaml = ruamel.yaml.YAML()
# do yaml stuff here
if __name__ == '__main__:
monkeypatch_emitter()
main()
自添加上述答案以来,write_comment
签名可能发生了一些变化,因为偶尔出现 'pre' 关键字参数导致签名失败。如果将来再次更改,上面的代码应该传递任何额外的参数。