使用 ruamel.yaml 如何根据线长有条件地将流量图转换为块图?
With ruamel.yaml how can I conditionally convert flow maps to block maps based on line length?
我正在开发基于 ruamel.yaml
(v0.17.4) 的 YAML 重新格式化程序(使用 RoundTrip 变体来保留注释)。
我想混合使用块式和流式地图,但在某些情况下,我想将流式地图转换为使用块式地图。
特别是,如果流式地图的长度超过最大线长^,我想将其转换为块式地图,而不是将线包裹在流式地图的中间某处.
^ “最大行长度”是指我通过设置 yaml.width = 120
之类的东西配置的 best_width
,其中 yaml
是 ruamel.yaml.YAML
实例.
我应该扩展什么来实现这个目标?发射器是计算线长的地方,因此可以发生环绕,但我怀疑在块式和流式之间转换为时已晚。我还担心在切换样式时会丢失评论。这里有一些可能的扩展点,你能告诉我我最有可能在这方面取得成功的地方吗?
Emitter.expect_flow_mapping()
可能来不及转换流->块
Serializer.serialize_node()
可能来不及咨询了 node.flow_style
RoundTripRepresenter.represent_mapping()
也许吧?但这不知道行长度
- 我也可以在调用
yaml.dump()
之前遍历数据,但这不知道行的长度。
那么,我应该在哪里以及在哪里可以调整flow_style流式地图是否会触发换行?
我认为最准确的方法是当你在转储过程中遇到一个 flow-style 映射时,首先尝试将它发送到一个缓冲区,然后获取缓冲区的长度,如果与您所在的列实际上发出 block-style.
任何试图在不实际尝试编写树的那部分的情况下猜测输出长度的尝试都将是困难的,如果不进行实际的发射是不可能做到的。除其他事项外,转储过程实际上转储标量并将它们读回以确保不需要强制引用(例如,当您转储像日期一样读回的字符串时)。它还以特殊方式处理列表中的单个 key-value 对([1, a: 42, 3]
而不是更冗长的 [1, {a: 42}, 3]
。因此简单计算作为键和值的标量的长度分隔逗号、冒号和空格不会很精确。
另一种方法是转储具有较大线宽的数据并解析输出并根据您实际要使用的宽度制作一组线太长的行号。加载该输出后,您可以递归地遍历数据结构,检查 .lc
属性以确定流式映射(或序列)开始的行号以及该行号是否在您之前构建的集合中将映射更改为块样式。如果您嵌套了 flow-style 个集合,则可能需要重复此过程。
如果您 运行 以下内容,quote
的初始转储值将在一行中。
所提供的 change_to_block
方法更改了所有太长的 mappings/sequences
在一条线上。
import sys
import ruamel.yaml
yaml_str = """\
movie: bladerunner
quote: {[Batty, Roy]: [
I have seen things you people wouldn't believe.,
Attack ships on fire off the shoulder of Orion.,
I watched C-beams glitter in the dark near the Tannhäuser Gate.,
]}
"""
class Blockify:
def __init__(self, width, only_first=False, verbose=0):
self._width = width
self._yaml = None
self._only_first = only_first
self._verbose = verbose
@property
def yaml(self):
if self._yaml is None:
self._yaml = y = ruamel.yaml.YAML(typ=['rt', 'string'])
y.preserve_quotes = True
y.width = 2**16
return self._yaml
def __call__(self, d):
pass_nr = 0
changed = [True]
while changed[0]:
changed[0] = False
try:
s = self.yaml.dumps(d)
except AttributeError:
print("use 'pip install ruamel.yaml.string' to install plugin that gives 'dumps' to string")
sys.exit(1)
if self._verbose > 1:
print(s)
too_long = set()
max_ll = -1
for line_nr, line in enumerate(s.splitlines()):
if len(line) > self._width:
too_long.add(line_nr)
if len(line) > max_ll:
max_ll = len(line)
if self._verbose > 0:
print(f'pass: {pass_nr}, lines: {sorted(too_long)}, longest: {max_ll}')
sys.stdout.flush()
new_d = self.yaml.load(s)
self.change_to_block(new_d, too_long, changed, only_first=self._only_first)
d = new_d
pass_nr += 1
return d, s
@staticmethod
def change_to_block(d, too_long, changed, only_first):
if isinstance(d, dict):
if d.fa.flow_style() and d.lc.line in too_long:
d.fa.set_block_style()
changed[0] = True
return # don't convert nested flow styles, might not be necessary
# don't change keys if any value is changed
for v in d.values():
Blockify.change_to_block(v, too_long, changed, only_first)
if only_first and changed[0]:
return
if changed[0]: # don't change keys if value has changed
return
for k in d:
Blockify.change_to_block(k, too_long, changed, only_first)
if only_first and changed[0]:
return
if isinstance(d, (list, tuple)):
if d.fa.flow_style() and d.lc.line in too_long:
d.fa.set_block_style()
changed[0] = True
return # don't convert nested flow styles, might not be necessary
for elem in d:
Blockify.change_to_block(elem, too_long, changed, only_first)
if only_first and changed[0]:
return
blockify = Blockify(96, verbose=2) # set verbose to 0, to suppress progress output
yaml = ruamel.yaml.YAML(typ=['rt', 'string'])
data = yaml.load(yaml_str)
blockified_data, string_output = blockify(data)
print('-'*32, 'result:', '-'*32)
print(string_output) # string_output has no final newline
给出:
movie: bladerunner
quote: {[Batty, Roy]: [I have seen things you people wouldn't believe., Attack ships on fire off the shoulder of Orion., I watched C-beams glitter in the dark near the Tannhäuser Gate.]}
pass: 0, lines: [1], longest: 186
movie: bladerunner
quote:
[Batty, Roy]: [I have seen things you people wouldn't believe., Attack ships on fire off the shoulder of Orion., I watched C-beams glitter in the dark near the Tannhäuser Gate.]
pass: 1, lines: [2], longest: 179
movie: bladerunner
quote:
[Batty, Roy]:
- I have seen things you people wouldn't believe.
- Attack ships on fire off the shoulder of Orion.
- I watched C-beams glitter in the dark near the Tannhäuser Gate.
pass: 2, lines: [], longest: 67
-------------------------------- result: --------------------------------
movie: bladerunner
quote:
[Batty, Roy]:
- I have seen things you people wouldn't believe.
- Attack ships on fire off the shoulder of Orion.
- I watched C-beams glitter in the dark near the Tannhäuser Gate.
请注意,当使用 ruamel.yaml<0.18
时,序列 [Batty, Roy
] 永远不会是块样式
因为 tuple
子类 CommentedKeySeq
永远不会附加行号。
我正在开发基于 ruamel.yaml
(v0.17.4) 的 YAML 重新格式化程序(使用 RoundTrip 变体来保留注释)。
我想混合使用块式和流式地图,但在某些情况下,我想将流式地图转换为使用块式地图。
特别是,如果流式地图的长度超过最大线长^,我想将其转换为块式地图,而不是将线包裹在流式地图的中间某处.
^ “最大行长度”是指我通过设置 yaml.width = 120
之类的东西配置的 best_width
,其中 yaml
是 ruamel.yaml.YAML
实例.
我应该扩展什么来实现这个目标?发射器是计算线长的地方,因此可以发生环绕,但我怀疑在块式和流式之间转换为时已晚。我还担心在切换样式时会丢失评论。这里有一些可能的扩展点,你能告诉我我最有可能在这方面取得成功的地方吗?
Emitter.expect_flow_mapping()
可能来不及转换流->块Serializer.serialize_node()
可能来不及咨询了node.flow_style
RoundTripRepresenter.represent_mapping()
也许吧?但这不知道行长度- 我也可以在调用
yaml.dump()
之前遍历数据,但这不知道行的长度。
那么,我应该在哪里以及在哪里可以调整flow_style流式地图是否会触发换行?
我认为最准确的方法是当你在转储过程中遇到一个 flow-style 映射时,首先尝试将它发送到一个缓冲区,然后获取缓冲区的长度,如果与您所在的列实际上发出 block-style.
任何试图在不实际尝试编写树的那部分的情况下猜测输出长度的尝试都将是困难的,如果不进行实际的发射是不可能做到的。除其他事项外,转储过程实际上转储标量并将它们读回以确保不需要强制引用(例如,当您转储像日期一样读回的字符串时)。它还以特殊方式处理列表中的单个 key-value 对([1, a: 42, 3]
而不是更冗长的 [1, {a: 42}, 3]
。因此简单计算作为键和值的标量的长度分隔逗号、冒号和空格不会很精确。
另一种方法是转储具有较大线宽的数据并解析输出并根据您实际要使用的宽度制作一组线太长的行号。加载该输出后,您可以递归地遍历数据结构,检查 .lc
属性以确定流式映射(或序列)开始的行号以及该行号是否在您之前构建的集合中将映射更改为块样式。如果您嵌套了 flow-style 个集合,则可能需要重复此过程。
如果您 运行 以下内容,quote
的初始转储值将在一行中。
所提供的 change_to_block
方法更改了所有太长的 mappings/sequences
在一条线上。
import sys
import ruamel.yaml
yaml_str = """\
movie: bladerunner
quote: {[Batty, Roy]: [
I have seen things you people wouldn't believe.,
Attack ships on fire off the shoulder of Orion.,
I watched C-beams glitter in the dark near the Tannhäuser Gate.,
]}
"""
class Blockify:
def __init__(self, width, only_first=False, verbose=0):
self._width = width
self._yaml = None
self._only_first = only_first
self._verbose = verbose
@property
def yaml(self):
if self._yaml is None:
self._yaml = y = ruamel.yaml.YAML(typ=['rt', 'string'])
y.preserve_quotes = True
y.width = 2**16
return self._yaml
def __call__(self, d):
pass_nr = 0
changed = [True]
while changed[0]:
changed[0] = False
try:
s = self.yaml.dumps(d)
except AttributeError:
print("use 'pip install ruamel.yaml.string' to install plugin that gives 'dumps' to string")
sys.exit(1)
if self._verbose > 1:
print(s)
too_long = set()
max_ll = -1
for line_nr, line in enumerate(s.splitlines()):
if len(line) > self._width:
too_long.add(line_nr)
if len(line) > max_ll:
max_ll = len(line)
if self._verbose > 0:
print(f'pass: {pass_nr}, lines: {sorted(too_long)}, longest: {max_ll}')
sys.stdout.flush()
new_d = self.yaml.load(s)
self.change_to_block(new_d, too_long, changed, only_first=self._only_first)
d = new_d
pass_nr += 1
return d, s
@staticmethod
def change_to_block(d, too_long, changed, only_first):
if isinstance(d, dict):
if d.fa.flow_style() and d.lc.line in too_long:
d.fa.set_block_style()
changed[0] = True
return # don't convert nested flow styles, might not be necessary
# don't change keys if any value is changed
for v in d.values():
Blockify.change_to_block(v, too_long, changed, only_first)
if only_first and changed[0]:
return
if changed[0]: # don't change keys if value has changed
return
for k in d:
Blockify.change_to_block(k, too_long, changed, only_first)
if only_first and changed[0]:
return
if isinstance(d, (list, tuple)):
if d.fa.flow_style() and d.lc.line in too_long:
d.fa.set_block_style()
changed[0] = True
return # don't convert nested flow styles, might not be necessary
for elem in d:
Blockify.change_to_block(elem, too_long, changed, only_first)
if only_first and changed[0]:
return
blockify = Blockify(96, verbose=2) # set verbose to 0, to suppress progress output
yaml = ruamel.yaml.YAML(typ=['rt', 'string'])
data = yaml.load(yaml_str)
blockified_data, string_output = blockify(data)
print('-'*32, 'result:', '-'*32)
print(string_output) # string_output has no final newline
给出:
movie: bladerunner
quote: {[Batty, Roy]: [I have seen things you people wouldn't believe., Attack ships on fire off the shoulder of Orion., I watched C-beams glitter in the dark near the Tannhäuser Gate.]}
pass: 0, lines: [1], longest: 186
movie: bladerunner
quote:
[Batty, Roy]: [I have seen things you people wouldn't believe., Attack ships on fire off the shoulder of Orion., I watched C-beams glitter in the dark near the Tannhäuser Gate.]
pass: 1, lines: [2], longest: 179
movie: bladerunner
quote:
[Batty, Roy]:
- I have seen things you people wouldn't believe.
- Attack ships on fire off the shoulder of Orion.
- I watched C-beams glitter in the dark near the Tannhäuser Gate.
pass: 2, lines: [], longest: 67
-------------------------------- result: --------------------------------
movie: bladerunner
quote:
[Batty, Roy]:
- I have seen things you people wouldn't believe.
- Attack ships on fire off the shoulder of Orion.
- I watched C-beams glitter in the dark near the Tannhäuser Gate.
请注意,当使用 ruamel.yaml<0.18
时,序列 [Batty, Roy
] 永远不会是块样式
因为 tuple
子类 CommentedKeySeq
永远不会附加行号。