区分pyparsing中的匹配项
Distinguish matches in pyparsing
我想用 pyparsing 解析一些单词和一些数字。简单吧。
from pyparsing import *
A = Word(nums).setResultsName('A')
B = Word(alphas).setResultsName('B')
expr = OneOrMore(A | B)
result = expr.parseString("123 abc 456 7 d")
print result
上面的代码打印 ['123', 'abc', '456', '7', 'd']
。所以一切正常。现在我想对这些解析后的值做一些工作。对于此任务,我需要知道它们是否匹配 A
或 B
。有没有办法区分这两者。
经过一番研究,我唯一发现的是 ParseResults
class 的 items
方法。但它只有returns[('A', '7'), ('B', 'd')]
,只有最后两个匹配项。
我的计划/目标如下:
for elem in result:
if elem.is_of_type('A'):
# do stuff
elif elem.is_of_type('B'):
# do something else
如何区分 A
和 B
?
我不完全确定为什么,但在您的 .setResultsName()
调用中,您需要指定 listAllMatches=True
(默认为 False
)。完成后,您可以遍历 result
并通过检查 result
.
的适当子项中的成员资格来检查每个标记是否与给定表达式匹配
from pyparsing import *
# ↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓
A = Word(nums ).setResultsName('A', listAllMatches=True)
B = Word(alphas).setResultsName('B', listAllMatches=True)
expr = OneOrMore(A | B)
result = expr.parseString("123 abc 456 7 d")
for elem in result:
if elem in list(result['A']):
print(elem, 'is in A')
elif elem in list(result['B']):
print(elem, 'is in B')
这会打印:
123 is in A
abc is in B
456 is in A
7 is in A
d is in B
这有点乱七八糟,我不确定这样做是否是规范正确的方法,但它似乎有效。
我自己找到了解决办法。 ParseResults
class 有一个 getName()
方法。但是简单地遍历 result
并打印 getName()
会抛出错误,因为 ParseResults
对象会产生字符串。一种解决方案是将 A
和 B
放入 Groups
。不确定这是否是最好的方法,但效果很好。
from pyparsing import *
A = Group(Word(nums)).setResultsName('A', listAllMatches=True)
B = Group(Word(alphas)).setResultsName('B', listAllMatches=True)
expr = OneOrMore(A | B)
result = expr.parseString("123 abc 456 7 d")
for i in result:
if i.getName() == 'A':
print i[0], 'is a number'
elif i.getName() == 'B':
print i[0], 'is a string'
getName() 做得不错。您还可以使用标记显式修饰返回的标记,指示进行了哪个匹配:
def makeDecoratingParseAction(marker):
def parse_action_impl(s,l,t):
return (marker, t[0])
return parse_action_impl
A = Word(nums).setParseAction(makeDecoratingParseAction("A"))
B = Word(alphas).setParseAction(makeDecoratingParseAction("B"))
expr = OneOrMore(A | B)
result = expr.parseString("123 abc 456 7 d")
print result.asList()
给出:
[('A', '123'), ('B', 'abc'), ('A', '456'), ('A', '7'), ('B', 'd')]
现在您可以迭代返回的元组,每个元组都标有适当的标记。
您可以更进一步,使用 class 来捕获类型和类型特定的 post-解析逻辑,然后将 class 作为表达式的解析动作。这将在返回的 ParseResults 中创建 classes 的实例,然后您可以使用某种 exec
或 doIt
方法直接执行:
class ResultsHandler(object):
"""Define base class to initialize location and tokens.
Call subclass-specific post_init() if one is defined."""
def __init__(self, s,locn,tokens):
self.locn = locn
self.tokens = tokens
if hasattr(self, "post_init"):
self.post_init()
class AHandler(ResultsHandler):
"""Handler for A expressions, which contain a numeric string."""
def post_init(self):
self.int_value = int(self.tokens[0])
self.odd_even = ("EVEN","ODD")[self.int_value % 2]
def doIt(self):
print "An A-Type was found at %d with value %d, which is an %s number" % (
self.locn, self.int_value, self.odd_even)
class BHandler(ResultsHandler):
"""Handler for B expressions, which contain an alphabetic string."""
def post_init(self):
self.string = self.tokens[0]
self.vowels_count = sum(self.string.lower().count(c) for c in "aeiou")
def doIt(self):
print "A B-Type was found at %d with value %s, and contains %d vowels" % (
self.locn, self.string, self.vowels_count)
# pass expression-specific handler classes as parse actions
A = Word(nums).setParseAction(AHandler)
B = Word(alphas).setParseAction(BHandler)
expr = OneOrMore(A | B)
# parse string and run handlers
result = expr.parseString("123 abc 456 7 d")
for handler in result:
handler.doIt()
打印:
An A-Type was found at 0 with value 123, which is an ODD number
A B-Type was found at 4 with value abc, and contains 1 vowels
An A-Type was found at 8 with value 456, which is an EVEN number
An A-Type was found at 12 with value 7, which is an ODD number
A B-Type was found at 14 with value d, and contains 0 vowels
我想用 pyparsing 解析一些单词和一些数字。简单吧。
from pyparsing import *
A = Word(nums).setResultsName('A')
B = Word(alphas).setResultsName('B')
expr = OneOrMore(A | B)
result = expr.parseString("123 abc 456 7 d")
print result
上面的代码打印 ['123', 'abc', '456', '7', 'd']
。所以一切正常。现在我想对这些解析后的值做一些工作。对于此任务,我需要知道它们是否匹配 A
或 B
。有没有办法区分这两者。
经过一番研究,我唯一发现的是 ParseResults
class 的 items
方法。但它只有returns[('A', '7'), ('B', 'd')]
,只有最后两个匹配项。
我的计划/目标如下:
for elem in result:
if elem.is_of_type('A'):
# do stuff
elif elem.is_of_type('B'):
# do something else
如何区分 A
和 B
?
我不完全确定为什么,但在您的 .setResultsName()
调用中,您需要指定 listAllMatches=True
(默认为 False
)。完成后,您可以遍历 result
并通过检查 result
.
from pyparsing import *
# ↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓
A = Word(nums ).setResultsName('A', listAllMatches=True)
B = Word(alphas).setResultsName('B', listAllMatches=True)
expr = OneOrMore(A | B)
result = expr.parseString("123 abc 456 7 d")
for elem in result:
if elem in list(result['A']):
print(elem, 'is in A')
elif elem in list(result['B']):
print(elem, 'is in B')
这会打印:
123 is in A
abc is in B
456 is in A
7 is in A
d is in B
这有点乱七八糟,我不确定这样做是否是规范正确的方法,但它似乎有效。
我自己找到了解决办法。 ParseResults
class 有一个 getName()
方法。但是简单地遍历 result
并打印 getName()
会抛出错误,因为 ParseResults
对象会产生字符串。一种解决方案是将 A
和 B
放入 Groups
。不确定这是否是最好的方法,但效果很好。
from pyparsing import *
A = Group(Word(nums)).setResultsName('A', listAllMatches=True)
B = Group(Word(alphas)).setResultsName('B', listAllMatches=True)
expr = OneOrMore(A | B)
result = expr.parseString("123 abc 456 7 d")
for i in result:
if i.getName() == 'A':
print i[0], 'is a number'
elif i.getName() == 'B':
print i[0], 'is a string'
getName() 做得不错。您还可以使用标记显式修饰返回的标记,指示进行了哪个匹配:
def makeDecoratingParseAction(marker):
def parse_action_impl(s,l,t):
return (marker, t[0])
return parse_action_impl
A = Word(nums).setParseAction(makeDecoratingParseAction("A"))
B = Word(alphas).setParseAction(makeDecoratingParseAction("B"))
expr = OneOrMore(A | B)
result = expr.parseString("123 abc 456 7 d")
print result.asList()
给出:
[('A', '123'), ('B', 'abc'), ('A', '456'), ('A', '7'), ('B', 'd')]
现在您可以迭代返回的元组,每个元组都标有适当的标记。
您可以更进一步,使用 class 来捕获类型和类型特定的 post-解析逻辑,然后将 class 作为表达式的解析动作。这将在返回的 ParseResults 中创建 classes 的实例,然后您可以使用某种 exec
或 doIt
方法直接执行:
class ResultsHandler(object):
"""Define base class to initialize location and tokens.
Call subclass-specific post_init() if one is defined."""
def __init__(self, s,locn,tokens):
self.locn = locn
self.tokens = tokens
if hasattr(self, "post_init"):
self.post_init()
class AHandler(ResultsHandler):
"""Handler for A expressions, which contain a numeric string."""
def post_init(self):
self.int_value = int(self.tokens[0])
self.odd_even = ("EVEN","ODD")[self.int_value % 2]
def doIt(self):
print "An A-Type was found at %d with value %d, which is an %s number" % (
self.locn, self.int_value, self.odd_even)
class BHandler(ResultsHandler):
"""Handler for B expressions, which contain an alphabetic string."""
def post_init(self):
self.string = self.tokens[0]
self.vowels_count = sum(self.string.lower().count(c) for c in "aeiou")
def doIt(self):
print "A B-Type was found at %d with value %s, and contains %d vowels" % (
self.locn, self.string, self.vowels_count)
# pass expression-specific handler classes as parse actions
A = Word(nums).setParseAction(AHandler)
B = Word(alphas).setParseAction(BHandler)
expr = OneOrMore(A | B)
# parse string and run handlers
result = expr.parseString("123 abc 456 7 d")
for handler in result:
handler.doIt()
打印:
An A-Type was found at 0 with value 123, which is an ODD number
A B-Type was found at 4 with value abc, and contains 1 vowels
An A-Type was found at 8 with value 456, which is an EVEN number
An A-Type was found at 12 with value 7, which is an ODD number
A B-Type was found at 14 with value d, and contains 0 vowels