C# 解析和更改 yaml 中的字符串
C# parse and change strings in yaml
我正在寻找一种方法来解析 yaml 文件并更改每个字符串,然后在不更改原始文件结构的情况下保存文件。在我看来,我不应该为此使用正则表达式,而应该使用某种 yaml 解析器。
示例 yaml 输入波纹管:
receipt: Oz-Ware Purchase Invoice
date: 2007-08-06
customer:
given: Dorothy
items:
- part_no: A4786
descrip: Water Bucket (Filled)
- part_no: E1628
descrip: High Heeled "Ruby" Slippers
size: 8
bill-to: &id001
street: |
123 Tornado Alley
Suite 16
city: East Centerville
state: KS
ship-to: *id001
specialDelivery: >
Follow the Yellow Brick
Road to the Emerald City.
...
期望的输出:
receipt: ###Oz-Ware Purchase Invoice###
date: ###2007-08-06###
customer:
given: ###Dorothy###
items:
- part_no: ###A4786###
descrip: ###Water Bucket (Filled)###
- part_no: ###E1628###
descrip: ###High Heeled "Ruby" Slippers###
size: ###8###
bill-to: ###&id001###
street: |
###123 Tornado Alley
Suite 16###
city: ###East Centerville###
state: ###KS###
ship-to: ###*id001###
specialDelivery: >
###Follow the Yellow Brick
Road to the Emerald City.###
...
是否有一个好的 yaml 解析器可以处理复杂的 yaml 文件、更改字符串并将该数据保存回来而不影响文档结构?也许你有其他想法如何解决这个问题。基本上我想遍历文档顶部的每个字符串并对字符串进行一些修改。
任何提示表示赞赏。
YAML 规范有 this to say:
In the representation model, mapping keys do not have an order. To serialize a mapping, it is necessary to impose an ordering on its keys. This order is a serialization detail and should not be used when composing the representation graph (and hence for the preservation of application data). In every case where node order is significant, a sequence must be used. For example, an ordered mapping can be represented as a sequence of mappings, where each mapping is a single key: value pair. YAML provides convenient compact notation for this case.
所以你真的不应该期望 YAML 在加载和保存文档时保持任何顺序。
话虽如此,我完全理解您的想法。由于 YAML 文档是为人类准备的,因此保持一定的顺序绝对有帮助。不幸的是,由于规范,大多数实现将使用无序数据结构来表示 key/value 映射。在 C# 和 Python 中,这将是一个字典;和字典是没有顺序的设计。
但是 C# 和 Python 确实有有序的字典类型,OrderedDictionary
and OrderedDict
,并且至少对于 Python,过去已经做了一些努力来维护键顺序使用有序词典:
!!omap
type是一个特殊的有序映射。 PyYAML 支持它。
- PyYAML ticket 29 talks about possibly including an
OrderedLoader
. There is also a short workaround using YAML constructors in between, and a possible implementation of a loader at the end.
- PyYAML ticket 161 has a recipe 也提供此功能。
- 最后,this other Stack Overflow question 涵盖了将 YAML 加载到
OrderedDict
s 中。
那是Python那边;我相信 C# 实现也有类似的努力。
大多数 YAML 解析器都是为读取由其他程序编写或由人类编辑的 YAML 而构建的,以及用于编写 YAML 以供其他程序读取的。众所周知,缺乏的是解析器编写人类仍可读的 YAML 的能力:
- 映射键的顺序未定义
- 评论被丢弃
- 标量文字块样式(如果有)被删除
- 舍弃标量周围的间距
- 标量折叠信息(如果有)被删除
加载手工制作的 YAML 文件的转储将导致与初始加载相同的内部数据结构,但中间转储通常看起来不像原始(手工制作的)YAML。
如果您有一个 Python 程序:
import ruamel.yaml as yaml
yaml_str = """\
receipt: Oz-Ware Purchase Invoice
date: 2007-08-06
customer:
given: Dorothy
items:
- part_no: A4786
descrip: Water Bucket (Filled)
- part_no: E1628
descrip: High Heeled "Ruby" Slippers
size: 8
bill-to: &id001
street: |
123 Tornado Alley
Suite 16
city: East Centerville
state: KS
ship-to: *id001
specialDelivery: >
Follow the Yellow Brick
Road to the Emerald City.
"""
data1 = yaml.load(yaml_str, Loader=yaml.Loader)
dump_str = yaml.dump(data1, Dumper=yaml.Dumper)
data2 = yaml.load(dump_str, Loader=yaml.Loader)
则以下断言成立:
assert data1 == data2
assert dump_str != yaml_str
中间的dump_str
看起来像:
bill-to: &id001 {city: East Centerville, state: KS, street: '123 Tornado Alley
Suite 16
'}
customer: {given: Dorothy}
date: 2007-08-06
items:
- {descrip: Water Bucket (Filled), part_no: A4786}
- {descrip: High Heeled "Ruby" Slippers, part_no: E1628, size: 8}
receipt: Oz-Ware Purchase Invoice
ship-to: *id001
specialDelivery: 'Follow the Yellow Brick Road to the Emerald City.
'
以上是 ruamel.yaml, PyYAML 以及许多其他语言的 YAML 解析器和在线 YAML 转换服务的默认行为。对于某些解析器,这是唯一提供的行为。
我开始 ruamel.yaml 作为 PyYAML 的增强功能的原因是从手工制作的 YAML 到内部数据,再到 YAML,从而产生更好的人类可读性(我称之为 往返),并保留更多信息(尤其是评论)。
data = yaml.load(yaml_str, Loader=yaml.RoundTripLoader)
print yaml.dump(data, Dumper=yaml.RoundTripDumper)
给你:
receipt: Oz-Ware Purchase Invoice
date: 2007-08-06
customer:
given: Dorothy
items:
- part_no: A4786
descrip: Water Bucket (Filled)
- part_no: E1628
descrip: High Heeled "Ruby" Slippers
size: 8
bill-to: &id001
street: |
123 Tornado Alley
Suite 16
city: East Centerville
state: KS
ship-to: *id001
specialDelivery: 'Follow the Yellow Brick Road to the Emerald City.
'
我的重点是注释、键、顺序和文字块样式。标量和折叠标量周围的间距(还)不是特别的。
从那里开始(您也可以在 PyYAML 中执行此操作,但您不会拥有 ruamel.yaml 键顺序保持的内置增强功能)您可以提供特殊的发射器,或挂接到系统一个较低的级别,覆盖 emitter.py
中的一些方法(并确保您可以调用
您不需要处理的案件的原件:
def rewrite_write_plain(self, text, split=True):
if self.state == self.expect_block_mapping_simple_value:
text = '###' + text + '###'
while self.column < 20:
text = ' ' + text
self.column += 1
self._org_write_plain(text, split)
def rewrite_write_literal(self, text):
if self.state == self.expect_block_mapping_simple_value:
last_nl = False
if text and text[-1] == '\n':
last_nl = True
text = text[:-1]
text = '###' + text + '###'
if False:
extra_indent = ''
while self.column < 15:
text = ' ' + text
extra_indent += ' '
self.column += 1
text = text.replace('\n', '\n' + extra_indent)
if last_nl:
text += '\n'
self._org_write_literal(text)
def rewrite_write_single_quoted(self, text, split=True):
if self.state == self.expect_block_mapping_simple_value:
last_nl = False
if text and text[-1] == u'\n':
last_nl = True
text = text[:-1]
text = u'###' + text + u'###'
if last_nl:
text += u'\n'
self.write_folded(text)
def rewrite_write_indicator(self, indicator, need_whitespace,
whitespace=False, indention=False):
if indicator and indicator[0] in u"*&":
indicator = u'###' + indicator + u'###'
while self.column < 20:
indicator = ' ' + indicator
self.column += 1
self._org_write_indicator(indicator, need_whitespace, whitespace,
indention)
dumper._org_write_plain = dumper.write_plain
dumper.write_plain = rewrite_write_plain
dumper._org_write_literal = dumper.write_literal
dumper.write_literal = rewrite_write_literal
dumper._org_write_single_quoted = dumper.write_single_quoted
dumper.write_single_quoted = rewrite_write_single_quoted
dumper._org_write_indicator = dumper.write_indicator
dumper.write_indicator = rewrite_write_indicator
print yaml.dump(data, Dumper=dumper, indent=4)
给你:
receipt: ###Oz-Ware Purchase Invoice###
date: ###2007-08-06###
customer:
given: ###Dorothy###
items:
- part_no: ###A4786###
descrip: ###Water Bucket (Filled)###
- part_no: ###E1628###
descrip: ###High Heeled "Ruby" Slippers###
size: ###8###
bill-to: ###&id001###
street: |
###123 Tornado Alley
Suite 16###
city: ###East Centerville###
state: ###KS###
ship-to: ###*id001###
specialDelivery: >
###Follow the Yellow Brick Road to the Emerald City.###
希望在 C# 中进行进一步处理是可以接受的
我正在寻找一种方法来解析 yaml 文件并更改每个字符串,然后在不更改原始文件结构的情况下保存文件。在我看来,我不应该为此使用正则表达式,而应该使用某种 yaml 解析器。 示例 yaml 输入波纹管:
receipt: Oz-Ware Purchase Invoice
date: 2007-08-06
customer:
given: Dorothy
items:
- part_no: A4786
descrip: Water Bucket (Filled)
- part_no: E1628
descrip: High Heeled "Ruby" Slippers
size: 8
bill-to: &id001
street: |
123 Tornado Alley
Suite 16
city: East Centerville
state: KS
ship-to: *id001
specialDelivery: >
Follow the Yellow Brick
Road to the Emerald City.
...
期望的输出:
receipt: ###Oz-Ware Purchase Invoice###
date: ###2007-08-06###
customer:
given: ###Dorothy###
items:
- part_no: ###A4786###
descrip: ###Water Bucket (Filled)###
- part_no: ###E1628###
descrip: ###High Heeled "Ruby" Slippers###
size: ###8###
bill-to: ###&id001###
street: |
###123 Tornado Alley
Suite 16###
city: ###East Centerville###
state: ###KS###
ship-to: ###*id001###
specialDelivery: >
###Follow the Yellow Brick
Road to the Emerald City.###
...
是否有一个好的 yaml 解析器可以处理复杂的 yaml 文件、更改字符串并将该数据保存回来而不影响文档结构?也许你有其他想法如何解决这个问题。基本上我想遍历文档顶部的每个字符串并对字符串进行一些修改。 任何提示表示赞赏。
YAML 规范有 this to say:
In the representation model, mapping keys do not have an order. To serialize a mapping, it is necessary to impose an ordering on its keys. This order is a serialization detail and should not be used when composing the representation graph (and hence for the preservation of application data). In every case where node order is significant, a sequence must be used. For example, an ordered mapping can be represented as a sequence of mappings, where each mapping is a single key: value pair. YAML provides convenient compact notation for this case.
所以你真的不应该期望 YAML 在加载和保存文档时保持任何顺序。
话虽如此,我完全理解您的想法。由于 YAML 文档是为人类准备的,因此保持一定的顺序绝对有帮助。不幸的是,由于规范,大多数实现将使用无序数据结构来表示 key/value 映射。在 C# 和 Python 中,这将是一个字典;和字典是没有顺序的设计。
但是 C# 和 Python 确实有有序的字典类型,OrderedDictionary
and OrderedDict
,并且至少对于 Python,过去已经做了一些努力来维护键顺序使用有序词典:
!!omap
type是一个特殊的有序映射。 PyYAML 支持它。- PyYAML ticket 29 talks about possibly including an
OrderedLoader
. There is also a short workaround using YAML constructors in between, and a possible implementation of a loader at the end. - PyYAML ticket 161 has a recipe 也提供此功能。
- 最后,this other Stack Overflow question 涵盖了将 YAML 加载到
OrderedDict
s 中。
那是Python那边;我相信 C# 实现也有类似的努力。
大多数 YAML 解析器都是为读取由其他程序编写或由人类编辑的 YAML 而构建的,以及用于编写 YAML 以供其他程序读取的。众所周知,缺乏的是解析器编写人类仍可读的 YAML 的能力:
- 映射键的顺序未定义
- 评论被丢弃
- 标量文字块样式(如果有)被删除
- 舍弃标量周围的间距
- 标量折叠信息(如果有)被删除
加载手工制作的 YAML 文件的转储将导致与初始加载相同的内部数据结构,但中间转储通常看起来不像原始(手工制作的)YAML。
如果您有一个 Python 程序:
import ruamel.yaml as yaml
yaml_str = """\
receipt: Oz-Ware Purchase Invoice
date: 2007-08-06
customer:
given: Dorothy
items:
- part_no: A4786
descrip: Water Bucket (Filled)
- part_no: E1628
descrip: High Heeled "Ruby" Slippers
size: 8
bill-to: &id001
street: |
123 Tornado Alley
Suite 16
city: East Centerville
state: KS
ship-to: *id001
specialDelivery: >
Follow the Yellow Brick
Road to the Emerald City.
"""
data1 = yaml.load(yaml_str, Loader=yaml.Loader)
dump_str = yaml.dump(data1, Dumper=yaml.Dumper)
data2 = yaml.load(dump_str, Loader=yaml.Loader)
则以下断言成立:
assert data1 == data2
assert dump_str != yaml_str
中间的dump_str
看起来像:
bill-to: &id001 {city: East Centerville, state: KS, street: '123 Tornado Alley
Suite 16
'}
customer: {given: Dorothy}
date: 2007-08-06
items:
- {descrip: Water Bucket (Filled), part_no: A4786}
- {descrip: High Heeled "Ruby" Slippers, part_no: E1628, size: 8}
receipt: Oz-Ware Purchase Invoice
ship-to: *id001
specialDelivery: 'Follow the Yellow Brick Road to the Emerald City.
'
以上是 ruamel.yaml, PyYAML 以及许多其他语言的 YAML 解析器和在线 YAML 转换服务的默认行为。对于某些解析器,这是唯一提供的行为。
我开始 ruamel.yaml 作为 PyYAML 的增强功能的原因是从手工制作的 YAML 到内部数据,再到 YAML,从而产生更好的人类可读性(我称之为 往返),并保留更多信息(尤其是评论)。
data = yaml.load(yaml_str, Loader=yaml.RoundTripLoader)
print yaml.dump(data, Dumper=yaml.RoundTripDumper)
给你:
receipt: Oz-Ware Purchase Invoice
date: 2007-08-06
customer:
given: Dorothy
items:
- part_no: A4786
descrip: Water Bucket (Filled)
- part_no: E1628
descrip: High Heeled "Ruby" Slippers
size: 8
bill-to: &id001
street: |
123 Tornado Alley
Suite 16
city: East Centerville
state: KS
ship-to: *id001
specialDelivery: 'Follow the Yellow Brick Road to the Emerald City.
'
我的重点是注释、键、顺序和文字块样式。标量和折叠标量周围的间距(还)不是特别的。
从那里开始(您也可以在 PyYAML 中执行此操作,但您不会拥有 ruamel.yaml 键顺序保持的内置增强功能)您可以提供特殊的发射器,或挂接到系统一个较低的级别,覆盖 emitter.py
中的一些方法(并确保您可以调用
您不需要处理的案件的原件:
def rewrite_write_plain(self, text, split=True):
if self.state == self.expect_block_mapping_simple_value:
text = '###' + text + '###'
while self.column < 20:
text = ' ' + text
self.column += 1
self._org_write_plain(text, split)
def rewrite_write_literal(self, text):
if self.state == self.expect_block_mapping_simple_value:
last_nl = False
if text and text[-1] == '\n':
last_nl = True
text = text[:-1]
text = '###' + text + '###'
if False:
extra_indent = ''
while self.column < 15:
text = ' ' + text
extra_indent += ' '
self.column += 1
text = text.replace('\n', '\n' + extra_indent)
if last_nl:
text += '\n'
self._org_write_literal(text)
def rewrite_write_single_quoted(self, text, split=True):
if self.state == self.expect_block_mapping_simple_value:
last_nl = False
if text and text[-1] == u'\n':
last_nl = True
text = text[:-1]
text = u'###' + text + u'###'
if last_nl:
text += u'\n'
self.write_folded(text)
def rewrite_write_indicator(self, indicator, need_whitespace,
whitespace=False, indention=False):
if indicator and indicator[0] in u"*&":
indicator = u'###' + indicator + u'###'
while self.column < 20:
indicator = ' ' + indicator
self.column += 1
self._org_write_indicator(indicator, need_whitespace, whitespace,
indention)
dumper._org_write_plain = dumper.write_plain
dumper.write_plain = rewrite_write_plain
dumper._org_write_literal = dumper.write_literal
dumper.write_literal = rewrite_write_literal
dumper._org_write_single_quoted = dumper.write_single_quoted
dumper.write_single_quoted = rewrite_write_single_quoted
dumper._org_write_indicator = dumper.write_indicator
dumper.write_indicator = rewrite_write_indicator
print yaml.dump(data, Dumper=dumper, indent=4)
给你:
receipt: ###Oz-Ware Purchase Invoice###
date: ###2007-08-06###
customer:
given: ###Dorothy###
items:
- part_no: ###A4786###
descrip: ###Water Bucket (Filled)###
- part_no: ###E1628###
descrip: ###High Heeled "Ruby" Slippers###
size: ###8###
bill-to: ###&id001###
street: |
###123 Tornado Alley
Suite 16###
city: ###East Centerville###
state: ###KS###
ship-to: ###*id001###
specialDelivery: >
###Follow the Yellow Brick Road to the Emerald City.###
希望在 C# 中进行进一步处理是可以接受的