使用 pyparsing 进行非贪婪列表解析
Non-greedy list parsing with pyparsing
我有一个由单词列表组成的字符串,我试图用 pyparsing 对其进行解析。
列表始终至少包含三个项目。由此我希望 pyparsing 生成三组,第一组包含所有单词直到最后两项,最后两组应该是最后两项。例如:
"one two three four"
应该被解析为类似于:
["one two"], "three", "four"
我可以用正则表达式做到这一点:
import pyparsing as pp
data = "one two three four"
grammar = pp.Regex(r"(?P<first>(\w+\W?)+)\s(?P<penultimate>\w+) (?P<ultimate>\w+)")
print(grammar.parseString(data).dump())
给出:
['one two three four']
- first: one two
- penultimate: three
- ultimate: four
我的问题是,由于 pyparsing 贪婪的性质,我无法使用非 Regex ParserElement 获得相同的结果,例如:
import pyparsing as pp
data = "one two three four"
word = pp.Word(pp.alphas)
grammar = pp.Group(pp.OneOrMore(word))("first") + word("penultimate") + word("ultimate")
grammar.parseString(data)
回溯失败:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/site-packages/pyparsing.py", line 1125, in parseString
raise exc
pyparsing.ParseException: Expected W:(abcd...) (at char 18), (line:1, col:19)
因为 OneOrMore 吞掉了列表中的所有单词。到目前为止,我使用 FollowedBy 或 NotAny 来防止这种贪婪行为的尝试都失败了 - 关于如何获得所需行为的任何建议?
好吧,您的 OneOrMore 表达式只需要稍微收紧 - 您在 FollowedBy 的正确轨道上。你真的不想要 OneOrMore(word),你想要 "OneOrMore(word that is followed at least 2 more words)"。要将这种先行添加到 pyparsing,您甚至可以使用新的“*”乘法运算符来指定先行计数:
grammar = pp.Group(pp.OneOrMore(word + pp.FollowedBy(word*2)))("first") + word("penultimate") + word("ultimate")
现在将其转储出来得到所需的:
[['one', 'two'], 'three', 'four']
- first: ['one', 'two']
- penultimate: three
- ultimate: four
我有一个由单词列表组成的字符串,我试图用 pyparsing 对其进行解析。
列表始终至少包含三个项目。由此我希望 pyparsing 生成三组,第一组包含所有单词直到最后两项,最后两组应该是最后两项。例如:
"one two three four"
应该被解析为类似于:
["one two"], "three", "four"
我可以用正则表达式做到这一点:
import pyparsing as pp
data = "one two three four"
grammar = pp.Regex(r"(?P<first>(\w+\W?)+)\s(?P<penultimate>\w+) (?P<ultimate>\w+)")
print(grammar.parseString(data).dump())
给出:
['one two three four']
- first: one two
- penultimate: three
- ultimate: four
我的问题是,由于 pyparsing 贪婪的性质,我无法使用非 Regex ParserElement 获得相同的结果,例如:
import pyparsing as pp
data = "one two three four"
word = pp.Word(pp.alphas)
grammar = pp.Group(pp.OneOrMore(word))("first") + word("penultimate") + word("ultimate")
grammar.parseString(data)
回溯失败:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/site-packages/pyparsing.py", line 1125, in parseString
raise exc
pyparsing.ParseException: Expected W:(abcd...) (at char 18), (line:1, col:19)
因为 OneOrMore 吞掉了列表中的所有单词。到目前为止,我使用 FollowedBy 或 NotAny 来防止这种贪婪行为的尝试都失败了 - 关于如何获得所需行为的任何建议?
好吧,您的 OneOrMore 表达式只需要稍微收紧 - 您在 FollowedBy 的正确轨道上。你真的不想要 OneOrMore(word),你想要 "OneOrMore(word that is followed at least 2 more words)"。要将这种先行添加到 pyparsing,您甚至可以使用新的“*”乘法运算符来指定先行计数:
grammar = pp.Group(pp.OneOrMore(word + pp.FollowedBy(word*2)))("first") + word("penultimate") + word("ultimate")
现在将其转储出来得到所需的:
[['one', 'two'], 'three', 'four']
- first: ['one', 'two']
- penultimate: three
- ultimate: four