使用 Pyparsing 访问已解析的元素
Access parsed elements using Pyparsing
我有一堆句子需要解析并转换为相应的正则表达式搜索代码。我的句子示例 -
LINE_CONTAINS phrase one BEFORE {phrase2 AND phrase3} AND LINE_STARTSWITH Therefore we
-这意味着在该行中,phrase one
出现在之前的某处
phrase2
和 phrase3
。此外,该行必须以 Therefore we
开头
LINE_CONTAINS abc {upto 4 words} xyz {upto 3 words} pqr
-这意味着我需要在前 2 个短语和
最后 2 个短语之间最多 3 个单词
在 Paul Mcguire () 的帮助下,编写了以下语法 -
from pyparsing import (CaselessKeyword, Word, alphanums, nums, MatchFirst, quotedString,
infixNotation, Combine, opAssoc, Suppress, pyparsing_common, Group, OneOrMore, ZeroOrMore)
LINE_CONTAINS, LINE_STARTSWITH = map(CaselessKeyword,
"""LINE_CONTAINS LINE_STARTSWITH """.split())
NOT, AND, OR = map(CaselessKeyword, "NOT AND OR".split())
BEFORE, AFTER, JOIN = map(CaselessKeyword, "BEFORE AFTER JOIN".split())
lpar=Suppress('{')
rpar=Suppress('}')
keyword = MatchFirst([LINE_CONTAINS, LINE_STARTSWITH, LINE_ENDSWITH, NOT, AND, OR,
BEFORE, AFTER, JOIN]) # declaring all keywords and assigning order for all further use
phrase_word = ~keyword + (Word(alphanums + '_'))
upto_N_words = Group(lpar + 'upto' + pyparsing_common.integer('numberofwords') + 'words' + rpar)
phrase_term = Group(OneOrMore(phrase_word) + ZeroOrMore((upto_N_words) + OneOrMore(phrase_word))
phrase_expr = infixNotation(phrase_term,
[
((BEFORE | AFTER | JOIN), 2, opAssoc.LEFT,), # (opExpr, numTerms, rightLeftAssoc, parseAction)
(NOT, 1, opAssoc.RIGHT,),
(AND, 2, opAssoc.LEFT,),
(OR, 2, opAssoc.LEFT),
],
lpar=Suppress('{'), rpar=Suppress('}')
) # structure of a single phrase with its operators
line_term = Group((LINE_CONTAINS | LINE_STARTSWITH | LINE_ENDSWITH)("line_directive") +
Group(phrase_expr)("phrase")) # basically giving structure to a single sub-rule having line-term and phrase
line_contents_expr = infixNotation(line_term,
[(NOT, 1, opAssoc.RIGHT,),
(AND, 2, opAssoc.LEFT,),
(OR, 2, opAssoc.LEFT),
]
) # grammar for the entire rule/sentence
sample1 = """
LINE_CONTAINS phrase one BEFORE {phrase2 AND phrase3} AND LINE_STARTSWITH Therefore we
"""
sample2 = """
LINE_CONTAINS abcd {upto 4 words} xyzw {upto 3 words} pqrs BEFORE something else
"""
我现在的问题是 - 如何访问已解析的元素以将句子转换为我的正则表达式代码。为此,我尝试了以下 -
parsed = line_contents_expr.parseString(sample1)/(sample2)
print (parsed[0].asDict())
print (parsed)
pprint.pprint(parsed)
上述 sample1
代码的结果是 -
{}
[[['LINE_CONTAINS', [[['sentence', 'one'], 'BEFORE', [['sentence2'],
'AND', ['sentence3']]]]], 'AND', ['LINE_STARTSWITH', [['Therefore',
'we']]]]]
([([(['LINE_CONTAINS', ([([(['sentence', 'one'], {}), 'BEFORE',
([(['sentence2'], {}), 'AND', (['sentence3'], {})], {})], {})], {})],
{'phrase': [(([([(['sentence', 'one'], {}), 'BEFORE',
([(['sentence2'], {}), 'AND', (['sentence3'], {})], {})], {})], {}),
1)], 'line_directive': [('LINE_CONTAINS', 0)]}), 'AND',
(['LINE_STARTSWITH', ([(['Therefore', 'we'], {})], {})], {'phrase':
[(([(['Therefore', 'we'], {})], {}), 1)], 'line_directive':
[('LINE_STARTSWITH', 0)]})], {})], {})
上述 sample2
代码的结果是 -
{'phrase': [[['abcd', {'numberofwords': 4}, 'xyzw', {'numberofwords':
3}, 'pqrs'], 'BEFORE', ['something', 'else']]], 'line_directive':
'LINE_CONTAINS'}
[['LINE_CONTAINS', [[['abcd', ['upto', 4, 'words'], 'xyzw', ['upto',
3, 'words'], 'pqrs'], 'BEFORE', ['something', 'else']]]]]
([(['LINE_CONTAINS', ([([(['abcd', (['upto', 4, 'words'],
{'numberofwords': [(4, 1)]}), 'xyzw', (['upto', 3, 'words'],
{'numberofwords': [(3, 1)]}), 'pqrs'], {}), 'BEFORE', (['something',
'else'], {})], {})], {})], {'phrase': [(([([(['abcd', (['upto', 4,
'words'], {'numberofwords': [(4, 1)]}), 'xyzw', (['upto', 3, 'words'],
{'numberofwords': [(3, 1)]}), 'pqrs'], {}), 'BEFORE', (['something',
'else'], {})], {})], {}), 1)], 'line_directive': [('LINE_CONTAINS',
0)]})], {})
我基于以上输出的问题是 -
- 为什么pprint(pretty print)解析比普通打印更详细?
- 为什么
asDict()
方法不为 sample1
提供输出但为 sample2
提供输出?
- 每当我尝试使用
print (parsed.numberofwords)
或 parsed.line_directive
或 parsed.line_term
访问已解析的元素时,它什么也没给我。我如何访问这些元素以使用它们来构建我的正则表达式代码?
回答您的打印问题。 1) pprint
用于漂亮地打印嵌套的标记列表,而不显示任何结果名称——它本质上是调用 pprint.pprint(results.asList())
的环绕。 2) asDict()
是将你的解析结果转换为实际的 Python 字典,所以它 仅 显示结果名称(如果你有名称,则嵌套在名称中)。
要查看已解析输出的内容,最好使用 print(result.dump())
。 dump()
显示结果的嵌套 和 沿途的任何命名结果。
result = line_contents_expr.parseString(sample2)
print(result.dump())
我还建议使用 expr.runTests
为您提供 dump()
输出以及任何异常和异常定位器。使用您的代码,您可以最轻松地使用:
line_contents_expr.runTests([sample1, sample2])
但我也建议您退后一步,想想这 {upto n words}
业务的意义所在。查看您的示例并在行项周围绘制矩形,然后在行项内在短语项周围绘制圆圈。 (这将是一个很好的练习,可以帮助你为自己编写该语法的 BNF 描述,我总是建议将其作为解决问题的步骤。)如果你处理 upto
表达式作为另一个运算符?要查看此内容,请将 phrase_term
改回原来的样子:
phrase_term = Group(OneOrMore(phrase_word))
然后将定义短语表达式的第一个优先条目更改为:
((BEFORE | AFTER | JOIN | upto_N_words), 2, opAssoc.LEFT,),
或者考虑让 upto
运算符的优先级高于或低于 BEFORE、AFTER 和 JOIN,并相应地调整优先级列表。
通过此更改,我通过对您的样本调用 runTests 得到以下输出:
LINE_CONTAINS phrase one BEFORE {phrase2 AND phrase3} AND LINE_STARTSWITH Therefore we
[[['LINE_CONTAINS', [[['phrase', 'one'], 'BEFORE', [['phrase2'], 'AND', ['phrase3']]]]], 'AND', ['LINE_STARTSWITH', [['Therefore', 'we']]]]]
[0]:
[['LINE_CONTAINS', [[['phrase', 'one'], 'BEFORE', [['phrase2'], 'AND', ['phrase3']]]]], 'AND', ['LINE_STARTSWITH', [['Therefore', 'we']]]]
[0]:
['LINE_CONTAINS', [[['phrase', 'one'], 'BEFORE', [['phrase2'], 'AND', ['phrase3']]]]]
- line_directive: 'LINE_CONTAINS'
- phrase: [[['phrase', 'one'], 'BEFORE', [['phrase2'], 'AND', ['phrase3']]]]
[0]:
[['phrase', 'one'], 'BEFORE', [['phrase2'], 'AND', ['phrase3']]]
[0]:
['phrase', 'one']
[1]:
BEFORE
[2]:
[['phrase2'], 'AND', ['phrase3']]
[0]:
['phrase2']
[1]:
AND
[2]:
['phrase3']
[1]:
AND
[2]:
['LINE_STARTSWITH', [['Therefore', 'we']]]
- line_directive: 'LINE_STARTSWITH'
- phrase: [['Therefore', 'we']]
[0]:
['Therefore', 'we']
LINE_CONTAINS abcd {upto 4 words} xyzw {upto 3 words} pqrs BEFORE something else
[['LINE_CONTAINS', [[['abcd'], ['upto', 4, 'words'], ['xyzw'], ['upto', 3, 'words'], ['pqrs'], 'BEFORE', ['something', 'else']]]]]
[0]:
['LINE_CONTAINS', [[['abcd'], ['upto', 4, 'words'], ['xyzw'], ['upto', 3, 'words'], ['pqrs'], 'BEFORE', ['something', 'else']]]]
- line_directive: 'LINE_CONTAINS'
- phrase: [[['abcd'], ['upto', 4, 'words'], ['xyzw'], ['upto', 3, 'words'], ['pqrs'], 'BEFORE', ['something', 'else']]]
[0]:
[['abcd'], ['upto', 4, 'words'], ['xyzw'], ['upto', 3, 'words'], ['pqrs'], 'BEFORE', ['something', 'else']]
[0]:
['abcd']
[1]:
['upto', 4, 'words']
- numberofwords: 4
[2]:
['xyzw']
[3]:
['upto', 3, 'words']
- numberofwords: 3
[4]:
['pqrs']
[5]:
BEFORE
[6]:
['something', 'else']
您可以遍历这些结果并将它们分开,但您很快就会到达应该查看从不同优先级构建可执行节点的地步 - 请参阅 pyparsing wiki 上的 SimpleBool.py 示例怎么做。
编辑:请查看 phrase_expr
解析器的精简版本,以及它如何创建自己生成输出的 Node
实例。查看 numberofwords
如何在 UpToNode
class 中的运算符上访问。查看 "xyz abc" 如何使用隐式 AND 运算符解释为 "xyz AND abc"。
from pyparsing import *
import re
UPTO, WORDS, AND, OR = map(CaselessKeyword, "upto words and or".split())
keyword = UPTO | WORDS | AND | OR
LBRACE,RBRACE = map(Suppress, "{}")
integer = pyparsing_common.integer()
word = ~keyword + Word(alphas)
upto_expr = Group(LBRACE + UPTO + integer("numberofwords") + WORDS + RBRACE)
class Node(object):
def __init__(self, tokens):
self.tokens = tokens
def generate(self):
pass
class LiteralNode(Node):
def generate(self):
return "(%s)" % re.escape(self.tokens[0])
def __repr__(self):
return repr(self.tokens[0])
class AndNode(Node):
def generate(self):
tokens = self.tokens[0]
return '.*'.join(t.generate() for t in tokens[::2])
def __repr__(self):
return ' AND '.join(repr(t) for t in self.tokens[0].asList()[::2])
class OrNode(Node):
def generate(self):
tokens = self.tokens[0]
return '|'.join(t.generate() for t in tokens[::2])
def __repr__(self):
return ' OR '.join(repr(t) for t in self.tokens[0].asList()[::2])
class UpToNode(Node):
def generate(self):
tokens = self.tokens[0]
ret = tokens[0].generate()
word_re = r"\s+\S+"
space_re = r"\s+"
for op, operand in zip(tokens[1::2], tokens[2::2]):
# op contains the parsed "upto" expression
ret += "((%s){0,%d}%s)" % (word_re, op.numberofwords, space_re) + operand.generate()
return ret
def __repr__(self):
tokens = self.tokens[0]
ret = repr(tokens[0])
for op, operand in zip(tokens[1::2], tokens[2::2]):
# op contains the parsed "upto" expression
ret += " {0-%d WORDS} " % (op.numberofwords) + repr(operand)
return ret
IMPLICIT_AND = Empty().setParseAction(replaceWith("AND"))
phrase_expr = infixNotation(word.setParseAction(LiteralNode),
[
(upto_expr, 2, opAssoc.LEFT, UpToNode),
(AND | IMPLICIT_AND, 2, opAssoc.LEFT, AndNode),
(OR, 2, opAssoc.LEFT, OrNode),
])
tests = """\
xyz
xyz abc
xyz {upto 4 words} def""".splitlines()
for t in tests:
t = t.strip()
if not t:
continue
print(t)
try:
parsed = phrase_expr.parseString(t)
except ParseException as pe:
print(' '*pe.loc + '^')
print(pe)
continue
print(parsed)
print(parsed[0].generate())
print()
打印:
xyz
['xyz']
(xyz)
xyz abc
['xyz' AND 'abc']
(xyz).*(abc)
xyz {upto 4 words} def
['xyz' {0-4 WORDS} 'def']
(xyz)((\s+\S+){0,4}\s+)(def)
对此进行扩展以支持您的 LINE_xxx
表达式。
我有一堆句子需要解析并转换为相应的正则表达式搜索代码。我的句子示例 -
LINE_CONTAINS phrase one BEFORE {phrase2 AND phrase3} AND LINE_STARTSWITH Therefore we
-这意味着在该行中,phrase one
出现在之前的某处
phrase2
和 phrase3
。此外,该行必须以 Therefore we
LINE_CONTAINS abc {upto 4 words} xyz {upto 3 words} pqr
-这意味着我需要在前 2 个短语和 最后 2 个短语之间最多 3 个单词
在 Paul Mcguire (
from pyparsing import (CaselessKeyword, Word, alphanums, nums, MatchFirst, quotedString,
infixNotation, Combine, opAssoc, Suppress, pyparsing_common, Group, OneOrMore, ZeroOrMore)
LINE_CONTAINS, LINE_STARTSWITH = map(CaselessKeyword,
"""LINE_CONTAINS LINE_STARTSWITH """.split())
NOT, AND, OR = map(CaselessKeyword, "NOT AND OR".split())
BEFORE, AFTER, JOIN = map(CaselessKeyword, "BEFORE AFTER JOIN".split())
lpar=Suppress('{')
rpar=Suppress('}')
keyword = MatchFirst([LINE_CONTAINS, LINE_STARTSWITH, LINE_ENDSWITH, NOT, AND, OR,
BEFORE, AFTER, JOIN]) # declaring all keywords and assigning order for all further use
phrase_word = ~keyword + (Word(alphanums + '_'))
upto_N_words = Group(lpar + 'upto' + pyparsing_common.integer('numberofwords') + 'words' + rpar)
phrase_term = Group(OneOrMore(phrase_word) + ZeroOrMore((upto_N_words) + OneOrMore(phrase_word))
phrase_expr = infixNotation(phrase_term,
[
((BEFORE | AFTER | JOIN), 2, opAssoc.LEFT,), # (opExpr, numTerms, rightLeftAssoc, parseAction)
(NOT, 1, opAssoc.RIGHT,),
(AND, 2, opAssoc.LEFT,),
(OR, 2, opAssoc.LEFT),
],
lpar=Suppress('{'), rpar=Suppress('}')
) # structure of a single phrase with its operators
line_term = Group((LINE_CONTAINS | LINE_STARTSWITH | LINE_ENDSWITH)("line_directive") +
Group(phrase_expr)("phrase")) # basically giving structure to a single sub-rule having line-term and phrase
line_contents_expr = infixNotation(line_term,
[(NOT, 1, opAssoc.RIGHT,),
(AND, 2, opAssoc.LEFT,),
(OR, 2, opAssoc.LEFT),
]
) # grammar for the entire rule/sentence
sample1 = """
LINE_CONTAINS phrase one BEFORE {phrase2 AND phrase3} AND LINE_STARTSWITH Therefore we
"""
sample2 = """
LINE_CONTAINS abcd {upto 4 words} xyzw {upto 3 words} pqrs BEFORE something else
"""
我现在的问题是 - 如何访问已解析的元素以将句子转换为我的正则表达式代码。为此,我尝试了以下 -
parsed = line_contents_expr.parseString(sample1)/(sample2)
print (parsed[0].asDict())
print (parsed)
pprint.pprint(parsed)
上述 sample1
代码的结果是 -
{}
[[['LINE_CONTAINS', [[['sentence', 'one'], 'BEFORE', [['sentence2'], 'AND', ['sentence3']]]]], 'AND', ['LINE_STARTSWITH', [['Therefore', 'we']]]]]
([([(['LINE_CONTAINS', ([([(['sentence', 'one'], {}), 'BEFORE', ([(['sentence2'], {}), 'AND', (['sentence3'], {})], {})], {})], {})], {'phrase': [(([([(['sentence', 'one'], {}), 'BEFORE', ([(['sentence2'], {}), 'AND', (['sentence3'], {})], {})], {})], {}), 1)], 'line_directive': [('LINE_CONTAINS', 0)]}), 'AND', (['LINE_STARTSWITH', ([(['Therefore', 'we'], {})], {})], {'phrase': [(([(['Therefore', 'we'], {})], {}), 1)], 'line_directive': [('LINE_STARTSWITH', 0)]})], {})], {})
上述 sample2
代码的结果是 -
{'phrase': [[['abcd', {'numberofwords': 4}, 'xyzw', {'numberofwords': 3}, 'pqrs'], 'BEFORE', ['something', 'else']]], 'line_directive': 'LINE_CONTAINS'}
[['LINE_CONTAINS', [[['abcd', ['upto', 4, 'words'], 'xyzw', ['upto', 3, 'words'], 'pqrs'], 'BEFORE', ['something', 'else']]]]]
([(['LINE_CONTAINS', ([([(['abcd', (['upto', 4, 'words'], {'numberofwords': [(4, 1)]}), 'xyzw', (['upto', 3, 'words'], {'numberofwords': [(3, 1)]}), 'pqrs'], {}), 'BEFORE', (['something', 'else'], {})], {})], {})], {'phrase': [(([([(['abcd', (['upto', 4, 'words'], {'numberofwords': [(4, 1)]}), 'xyzw', (['upto', 3, 'words'], {'numberofwords': [(3, 1)]}), 'pqrs'], {}), 'BEFORE', (['something', 'else'], {})], {})], {}), 1)], 'line_directive': [('LINE_CONTAINS', 0)]})], {})
我基于以上输出的问题是 -
- 为什么pprint(pretty print)解析比普通打印更详细?
- 为什么
asDict()
方法不为sample1
提供输出但为sample2
提供输出? - 每当我尝试使用
print (parsed.numberofwords)
或parsed.line_directive
或parsed.line_term
访问已解析的元素时,它什么也没给我。我如何访问这些元素以使用它们来构建我的正则表达式代码?
回答您的打印问题。 1) pprint
用于漂亮地打印嵌套的标记列表,而不显示任何结果名称——它本质上是调用 pprint.pprint(results.asList())
的环绕。 2) asDict()
是将你的解析结果转换为实际的 Python 字典,所以它 仅 显示结果名称(如果你有名称,则嵌套在名称中)。
要查看已解析输出的内容,最好使用 print(result.dump())
。 dump()
显示结果的嵌套 和 沿途的任何命名结果。
result = line_contents_expr.parseString(sample2)
print(result.dump())
我还建议使用 expr.runTests
为您提供 dump()
输出以及任何异常和异常定位器。使用您的代码,您可以最轻松地使用:
line_contents_expr.runTests([sample1, sample2])
但我也建议您退后一步,想想这 {upto n words}
业务的意义所在。查看您的示例并在行项周围绘制矩形,然后在行项内在短语项周围绘制圆圈。 (这将是一个很好的练习,可以帮助你为自己编写该语法的 BNF 描述,我总是建议将其作为解决问题的步骤。)如果你处理 upto
表达式作为另一个运算符?要查看此内容,请将 phrase_term
改回原来的样子:
phrase_term = Group(OneOrMore(phrase_word))
然后将定义短语表达式的第一个优先条目更改为:
((BEFORE | AFTER | JOIN | upto_N_words), 2, opAssoc.LEFT,),
或者考虑让 upto
运算符的优先级高于或低于 BEFORE、AFTER 和 JOIN,并相应地调整优先级列表。
通过此更改,我通过对您的样本调用 runTests 得到以下输出:
LINE_CONTAINS phrase one BEFORE {phrase2 AND phrase3} AND LINE_STARTSWITH Therefore we
[[['LINE_CONTAINS', [[['phrase', 'one'], 'BEFORE', [['phrase2'], 'AND', ['phrase3']]]]], 'AND', ['LINE_STARTSWITH', [['Therefore', 'we']]]]]
[0]:
[['LINE_CONTAINS', [[['phrase', 'one'], 'BEFORE', [['phrase2'], 'AND', ['phrase3']]]]], 'AND', ['LINE_STARTSWITH', [['Therefore', 'we']]]]
[0]:
['LINE_CONTAINS', [[['phrase', 'one'], 'BEFORE', [['phrase2'], 'AND', ['phrase3']]]]]
- line_directive: 'LINE_CONTAINS'
- phrase: [[['phrase', 'one'], 'BEFORE', [['phrase2'], 'AND', ['phrase3']]]]
[0]:
[['phrase', 'one'], 'BEFORE', [['phrase2'], 'AND', ['phrase3']]]
[0]:
['phrase', 'one']
[1]:
BEFORE
[2]:
[['phrase2'], 'AND', ['phrase3']]
[0]:
['phrase2']
[1]:
AND
[2]:
['phrase3']
[1]:
AND
[2]:
['LINE_STARTSWITH', [['Therefore', 'we']]]
- line_directive: 'LINE_STARTSWITH'
- phrase: [['Therefore', 'we']]
[0]:
['Therefore', 'we']
LINE_CONTAINS abcd {upto 4 words} xyzw {upto 3 words} pqrs BEFORE something else
[['LINE_CONTAINS', [[['abcd'], ['upto', 4, 'words'], ['xyzw'], ['upto', 3, 'words'], ['pqrs'], 'BEFORE', ['something', 'else']]]]]
[0]:
['LINE_CONTAINS', [[['abcd'], ['upto', 4, 'words'], ['xyzw'], ['upto', 3, 'words'], ['pqrs'], 'BEFORE', ['something', 'else']]]]
- line_directive: 'LINE_CONTAINS'
- phrase: [[['abcd'], ['upto', 4, 'words'], ['xyzw'], ['upto', 3, 'words'], ['pqrs'], 'BEFORE', ['something', 'else']]]
[0]:
[['abcd'], ['upto', 4, 'words'], ['xyzw'], ['upto', 3, 'words'], ['pqrs'], 'BEFORE', ['something', 'else']]
[0]:
['abcd']
[1]:
['upto', 4, 'words']
- numberofwords: 4
[2]:
['xyzw']
[3]:
['upto', 3, 'words']
- numberofwords: 3
[4]:
['pqrs']
[5]:
BEFORE
[6]:
['something', 'else']
您可以遍历这些结果并将它们分开,但您很快就会到达应该查看从不同优先级构建可执行节点的地步 - 请参阅 pyparsing wiki 上的 SimpleBool.py 示例怎么做。
编辑:请查看 phrase_expr
解析器的精简版本,以及它如何创建自己生成输出的 Node
实例。查看 numberofwords
如何在 UpToNode
class 中的运算符上访问。查看 "xyz abc" 如何使用隐式 AND 运算符解释为 "xyz AND abc"。
from pyparsing import *
import re
UPTO, WORDS, AND, OR = map(CaselessKeyword, "upto words and or".split())
keyword = UPTO | WORDS | AND | OR
LBRACE,RBRACE = map(Suppress, "{}")
integer = pyparsing_common.integer()
word = ~keyword + Word(alphas)
upto_expr = Group(LBRACE + UPTO + integer("numberofwords") + WORDS + RBRACE)
class Node(object):
def __init__(self, tokens):
self.tokens = tokens
def generate(self):
pass
class LiteralNode(Node):
def generate(self):
return "(%s)" % re.escape(self.tokens[0])
def __repr__(self):
return repr(self.tokens[0])
class AndNode(Node):
def generate(self):
tokens = self.tokens[0]
return '.*'.join(t.generate() for t in tokens[::2])
def __repr__(self):
return ' AND '.join(repr(t) for t in self.tokens[0].asList()[::2])
class OrNode(Node):
def generate(self):
tokens = self.tokens[0]
return '|'.join(t.generate() for t in tokens[::2])
def __repr__(self):
return ' OR '.join(repr(t) for t in self.tokens[0].asList()[::2])
class UpToNode(Node):
def generate(self):
tokens = self.tokens[0]
ret = tokens[0].generate()
word_re = r"\s+\S+"
space_re = r"\s+"
for op, operand in zip(tokens[1::2], tokens[2::2]):
# op contains the parsed "upto" expression
ret += "((%s){0,%d}%s)" % (word_re, op.numberofwords, space_re) + operand.generate()
return ret
def __repr__(self):
tokens = self.tokens[0]
ret = repr(tokens[0])
for op, operand in zip(tokens[1::2], tokens[2::2]):
# op contains the parsed "upto" expression
ret += " {0-%d WORDS} " % (op.numberofwords) + repr(operand)
return ret
IMPLICIT_AND = Empty().setParseAction(replaceWith("AND"))
phrase_expr = infixNotation(word.setParseAction(LiteralNode),
[
(upto_expr, 2, opAssoc.LEFT, UpToNode),
(AND | IMPLICIT_AND, 2, opAssoc.LEFT, AndNode),
(OR, 2, opAssoc.LEFT, OrNode),
])
tests = """\
xyz
xyz abc
xyz {upto 4 words} def""".splitlines()
for t in tests:
t = t.strip()
if not t:
continue
print(t)
try:
parsed = phrase_expr.parseString(t)
except ParseException as pe:
print(' '*pe.loc + '^')
print(pe)
continue
print(parsed)
print(parsed[0].generate())
print()
打印:
xyz
['xyz']
(xyz)
xyz abc
['xyz' AND 'abc']
(xyz).*(abc)
xyz {upto 4 words} def
['xyz' {0-4 WORDS} 'def']
(xyz)((\s+\S+){0,4}\s+)(def)
对此进行扩展以支持您的 LINE_xxx
表达式。