pyparsing 带有引号和行继续的键值对

pyparsing key value pairs with quotes and line continuation

使用 pyparsing 模块,我能够从输入文件中解析 key/value 对。它们可以像下面这样:

key1=value1
key2="value2"
key3="value3 and some more text
"
key4="value4 and ""inserted quotes"" with
more text"

使用以下规则:

eq = Literal('=').suppress()
v1 = QuotedString('"')
v2 = QuotedString('"', multline=True, escQuote='""')
value = Group(v1 | v2)("value")
kv = Group(key + eq + value)("key_value")

我现在遇到一个问题,在引用的一段文本中使用引号作为续行 (!!!)。请注意,引号在 key_value 对中使用(不是作为转义字符),而是作为连接两个相邻行的手段。

示例:

key5="some more text that is so long that the authors who serialized it to a file thought it"
"would be a good idea to to concatenate strings this way"

有没有办法干净利落地处理这个问题,或者我应该先尝试识别这些并用另一种方法替换这种连接方法?

首先,您的 v2 表达式实际上是 v1 表达式的超集。也就是说,任何匹配 v1 的东西也会匹配 v2,所以你真的不需要做 value = v1 | v2value = v2 就可以了。

然后,要处理具有多个“相邻”引号字符串的情况,而不是解析单个引号字符串,解析一个或多个,然后使用解析操作连接它们:

v2 = OneOrMore(QuotedString('"', multiline=True, escQuote='""'))

# add a parse action to convert multiple matched quoted strings to a single
# concatenated string
v2.addParseAction(''.join)

value = v2

# I made a slight change in this expression, moving the results names
# down into this compositional expression
kv = Group(key("key") + eq + value("value"))("key_value")

使用此测试代码:

for parsed_kv in kv.searchString(source):
    print(parsed_kv.dump())
    print()

将打印:

[['key2', 'value2']]
- key_value: ['key2', 'value2']
  - key: 'key2'
  - value: 'value2'
[0]:
  ['key2', 'value2']
  - key: 'key2'
  - value: 'value2'

[['key3', 'value3 and some more text\n']]
- key_value: ['key3', 'value3 and some more text\n']
  - key: 'key3'
  - value: 'value3 and some more text\n'
[0]:
  ['key3', 'value3 and some more text\n']
  - key: 'key3'
  - value: 'value3 and some more text\n'

[['key4', 'value4 and "inserted quotes" with\nmore text']]
- key_value: ['key4', 'value4 and "inserted quotes" with\nmore text']
  - key: 'key4'
  - value: 'value4 and "inserted quotes" with\nmore text'
[0]:
  ['key4', 'value4 and "inserted quotes" with\nmore text']
  - key: 'key4'
  - value: 'value4 and "inserted quotes" with\nmore text'

[['key5', 'some more text that is so long that the authors who serialized it to a file thought it would be a good idea to to concatenate strings this way']]
- key_value: ['key5', 'some more text that is so long that the authors who serialized it to a file thought it would be a good idea to to concatenate strings this way']
  - key: 'key5'
  - value: 'some more text that is so long that the authors who serialized it to a file thought it would be a good idea to to concatenate strings this way'
[0]:
  ['key5', 'some more text that is so long that the authors who serialized it to a file thought it would be a good idea to to concatenate strings this way']
  - key: 'key5'
  - value: 'some more text that is so long that the authors who serialized it to a file thought it would be a good idea to to concatenate strings this way'