pyparsing 带有引号和行继续的键值对
pyparsing key value pairs with quotes and line continuation
使用 pyparsing 模块,我能够从输入文件中解析 key/value 对。它们可以像下面这样:
key1=value1
key2="value2"
key3="value3 and some more text
"
key4="value4 and ""inserted quotes"" with
more text"
使用以下规则:
eq = Literal('=').suppress()
v1 = QuotedString('"')
v2 = QuotedString('"', multline=True, escQuote='""')
value = Group(v1 | v2)("value")
kv = Group(key + eq + value)("key_value")
我现在遇到一个问题,在引用的一段文本中使用引号作为续行 (!!!)。请注意,引号在 key_value 对中使用(不是作为转义字符),而是作为连接两个相邻行的手段。
示例:
key5="some more text that is so long that the authors who serialized it to a file thought it"
"would be a good idea to to concatenate strings this way"
有没有办法干净利落地处理这个问题,或者我应该先尝试识别这些并用另一种方法替换这种连接方法?
首先,您的 v2
表达式实际上是 v1
表达式的超集。也就是说,任何匹配 v1
的东西也会匹配 v2
,所以你真的不需要做 value = v1 | v2
,value = v2
就可以了。
然后,要处理具有多个“相邻”引号字符串的情况,而不是解析单个引号字符串,解析一个或多个,然后使用解析操作连接它们:
v2 = OneOrMore(QuotedString('"', multiline=True, escQuote='""'))
# add a parse action to convert multiple matched quoted strings to a single
# concatenated string
v2.addParseAction(''.join)
value = v2
# I made a slight change in this expression, moving the results names
# down into this compositional expression
kv = Group(key("key") + eq + value("value"))("key_value")
使用此测试代码:
for parsed_kv in kv.searchString(source):
print(parsed_kv.dump())
print()
将打印:
[['key2', 'value2']]
- key_value: ['key2', 'value2']
- key: 'key2'
- value: 'value2'
[0]:
['key2', 'value2']
- key: 'key2'
- value: 'value2'
[['key3', 'value3 and some more text\n']]
- key_value: ['key3', 'value3 and some more text\n']
- key: 'key3'
- value: 'value3 and some more text\n'
[0]:
['key3', 'value3 and some more text\n']
- key: 'key3'
- value: 'value3 and some more text\n'
[['key4', 'value4 and "inserted quotes" with\nmore text']]
- key_value: ['key4', 'value4 and "inserted quotes" with\nmore text']
- key: 'key4'
- value: 'value4 and "inserted quotes" with\nmore text'
[0]:
['key4', 'value4 and "inserted quotes" with\nmore text']
- key: 'key4'
- value: 'value4 and "inserted quotes" with\nmore text'
[['key5', 'some more text that is so long that the authors who serialized it to a file thought it would be a good idea to to concatenate strings this way']]
- key_value: ['key5', 'some more text that is so long that the authors who serialized it to a file thought it would be a good idea to to concatenate strings this way']
- key: 'key5'
- value: 'some more text that is so long that the authors who serialized it to a file thought it would be a good idea to to concatenate strings this way'
[0]:
['key5', 'some more text that is so long that the authors who serialized it to a file thought it would be a good idea to to concatenate strings this way']
- key: 'key5'
- value: 'some more text that is so long that the authors who serialized it to a file thought it would be a good idea to to concatenate strings this way'
使用 pyparsing 模块,我能够从输入文件中解析 key/value 对。它们可以像下面这样:
key1=value1
key2="value2"
key3="value3 and some more text
"
key4="value4 and ""inserted quotes"" with
more text"
使用以下规则:
eq = Literal('=').suppress()
v1 = QuotedString('"')
v2 = QuotedString('"', multline=True, escQuote='""')
value = Group(v1 | v2)("value")
kv = Group(key + eq + value)("key_value")
我现在遇到一个问题,在引用的一段文本中使用引号作为续行 (!!!)。请注意,引号在 key_value 对中使用(不是作为转义字符),而是作为连接两个相邻行的手段。
示例:
key5="some more text that is so long that the authors who serialized it to a file thought it"
"would be a good idea to to concatenate strings this way"
有没有办法干净利落地处理这个问题,或者我应该先尝试识别这些并用另一种方法替换这种连接方法?
首先,您的 v2
表达式实际上是 v1
表达式的超集。也就是说,任何匹配 v1
的东西也会匹配 v2
,所以你真的不需要做 value = v1 | v2
,value = v2
就可以了。
然后,要处理具有多个“相邻”引号字符串的情况,而不是解析单个引号字符串,解析一个或多个,然后使用解析操作连接它们:
v2 = OneOrMore(QuotedString('"', multiline=True, escQuote='""'))
# add a parse action to convert multiple matched quoted strings to a single
# concatenated string
v2.addParseAction(''.join)
value = v2
# I made a slight change in this expression, moving the results names
# down into this compositional expression
kv = Group(key("key") + eq + value("value"))("key_value")
使用此测试代码:
for parsed_kv in kv.searchString(source):
print(parsed_kv.dump())
print()
将打印:
[['key2', 'value2']]
- key_value: ['key2', 'value2']
- key: 'key2'
- value: 'value2'
[0]:
['key2', 'value2']
- key: 'key2'
- value: 'value2'
[['key3', 'value3 and some more text\n']]
- key_value: ['key3', 'value3 and some more text\n']
- key: 'key3'
- value: 'value3 and some more text\n'
[0]:
['key3', 'value3 and some more text\n']
- key: 'key3'
- value: 'value3 and some more text\n'
[['key4', 'value4 and "inserted quotes" with\nmore text']]
- key_value: ['key4', 'value4 and "inserted quotes" with\nmore text']
- key: 'key4'
- value: 'value4 and "inserted quotes" with\nmore text'
[0]:
['key4', 'value4 and "inserted quotes" with\nmore text']
- key: 'key4'
- value: 'value4 and "inserted quotes" with\nmore text'
[['key5', 'some more text that is so long that the authors who serialized it to a file thought it would be a good idea to to concatenate strings this way']]
- key_value: ['key5', 'some more text that is so long that the authors who serialized it to a file thought it would be a good idea to to concatenate strings this way']
- key: 'key5'
- value: 'some more text that is so long that the authors who serialized it to a file thought it would be a good idea to to concatenate strings this way'
[0]:
['key5', 'some more text that is so long that the authors who serialized it to a file thought it would be a good idea to to concatenate strings this way']
- key: 'key5'
- value: 'some more text that is so long that the authors who serialized it to a file thought it would be a good idea to to concatenate strings this way'