如何在 Pyparsing 中捕获运算符之前的所有内容
How capture everything before operator in Pyparsing
参考Pyparsing problem with operators
我正在尝试创建 pyparsing 语法。我想在运算符 "and"/"or".
之前将 space 分隔的实体捕获为单个单词
预期结果是:
(United kingdom or Sweden)
['United kingdom','or','Sweden']
我得到的是
['United', 'kingdom','or','Sweden']
到目前为止的代码
from pyparsing import *
import json
QUOTED = quotedString.setParseAction(removeQuotes)
OAND = CaselessLiteral("and")
OOR = CaselessLiteral("or")
ONOT = CaselessLiteral("not")
WORDWITHSPACE = Combine(OneOrMore(Word(printables.replace("(", "").replace(")", "")) | White(
' ') + ~(White() | OAND | ONOT | OOR)))
TERM = (QUOTED | WORDWITHSPACE)
EXPRESSION = operatorPrecedence(TERM,
[
(ONOT, 1, opAssoc.RIGHT),
(OAND, 2, opAssoc.LEFT),
(OOR, 2, opAssoc.LEFT)
])
STRING = OneOrMore(EXPRESSION) + StringEnd()
我重新定义了 WORDWITHSPACE 如下:
# space-separated words are easiest to define using just OneOrMore
# must use a negative lookahead for and/not/or operators, and this must come
# at the beginning of the expression
WORDWITHSPACE = OneOrMore(~(OAND | ONOT | OOR) + Word(printables, excludeChars="()"))
# use a parse action to recombine words into a single string
WORDWITHSPACE.addParseAction(' '.join)
通过对您的代码示例进行这些更改,我能够编写:
tests = """
# basic test
United Kingdom or Sweden
# multiple operators at the same precedence level
United Kingdom or Sweden or France
# implicit grouping by precedence - 'and' is higher prec than 'or
United Kingdom or Sweden and People's Republic of China
# use ()'s to override precedence of 'and' over 'or
(United Kingdom or Sweden) and People's Republic of China
"""
EXPRESSION.runTests(tests, fullDump=False)
并获得
# basic test
United Kingdom or Sweden
[['United Kingdom', 'or', 'Sweden']]
# multiple operators at the same precedence level
United Kingdom or Sweden or France
[['United Kingdom', 'or', 'Sweden', 'or', 'France']]
# implicit grouping by precedence - 'and' is higher prec than 'or
United Kingdom or Sweden and People's Republic of China
[['United Kingdom', 'or', ['Sweden', 'and', "People's Republic of China"]]]
# use ()'s to override precedence of 'and' over 'or
(United Kingdom or Sweden) and People's Republic of China
[[['United Kingdom', 'or', 'Sweden'], 'and', "People's Republic of China"]]
参考Pyparsing problem with operators
我正在尝试创建 pyparsing 语法。我想在运算符 "and"/"or".
之前将 space 分隔的实体捕获为单个单词预期结果是:
(United kingdom or Sweden)
['United kingdom','or','Sweden']
我得到的是
['United', 'kingdom','or','Sweden']
到目前为止的代码
from pyparsing import *
import json
QUOTED = quotedString.setParseAction(removeQuotes)
OAND = CaselessLiteral("and")
OOR = CaselessLiteral("or")
ONOT = CaselessLiteral("not")
WORDWITHSPACE = Combine(OneOrMore(Word(printables.replace("(", "").replace(")", "")) | White(
' ') + ~(White() | OAND | ONOT | OOR)))
TERM = (QUOTED | WORDWITHSPACE)
EXPRESSION = operatorPrecedence(TERM,
[
(ONOT, 1, opAssoc.RIGHT),
(OAND, 2, opAssoc.LEFT),
(OOR, 2, opAssoc.LEFT)
])
STRING = OneOrMore(EXPRESSION) + StringEnd()
我重新定义了 WORDWITHSPACE 如下:
# space-separated words are easiest to define using just OneOrMore
# must use a negative lookahead for and/not/or operators, and this must come
# at the beginning of the expression
WORDWITHSPACE = OneOrMore(~(OAND | ONOT | OOR) + Word(printables, excludeChars="()"))
# use a parse action to recombine words into a single string
WORDWITHSPACE.addParseAction(' '.join)
通过对您的代码示例进行这些更改,我能够编写:
tests = """
# basic test
United Kingdom or Sweden
# multiple operators at the same precedence level
United Kingdom or Sweden or France
# implicit grouping by precedence - 'and' is higher prec than 'or
United Kingdom or Sweden and People's Republic of China
# use ()'s to override precedence of 'and' over 'or
(United Kingdom or Sweden) and People's Republic of China
"""
EXPRESSION.runTests(tests, fullDump=False)
并获得
# basic test
United Kingdom or Sweden
[['United Kingdom', 'or', 'Sweden']]
# multiple operators at the same precedence level
United Kingdom or Sweden or France
[['United Kingdom', 'or', 'Sweden', 'or', 'France']]
# implicit grouping by precedence - 'and' is higher prec than 'or
United Kingdom or Sweden and People's Republic of China
[['United Kingdom', 'or', ['Sweden', 'and', "People's Republic of China"]]]
# use ()'s to override precedence of 'and' over 'or
(United Kingdom or Sweden) and People's Republic of China
[[['United Kingdom', 'or', 'Sweden'], 'and', "People's Republic of China"]]