解析时引用令牌值
Reference token value at parse time
我正在尝试解析以下内容:
<delimiter><text><delimiter><text><delimter>
其中 delimiter
可以是重复三次的任何单个文字字符,text
可以是分隔符 旁边的任何可打印字符( [ 的第一次和第二次出现=14=] do not 必须匹配并且可以为空)。
这是我想出的,但是 text
从第一个分隔符到字符串的末尾消耗。
from pyparsing import Word, printables
delimiter = Word(printables, exact=1)
text = (Word(printables) + ~delimiter)
parser = delimiter + text # + delimiter + text + delimiter
tests = [
('_abc_123_', ['_', 'abc', '_', '123', '_']),
('-abc-123-', ['-', 'abc', '-', '123', '-']),
('___', ['_', '', '_', '', '_']),
]
for test, expected in tests:
print parser.parseString(test), '<=>', expected
脚本输出:
['_', 'abc_123_'] <=> ['_', 'abc', '_', '123', '_']
['-', 'abc-123-'] <=> ['-', 'abc', '-', '123', '-']
['_', '__'] <=> ['_', '', '_', '', '_']
我想我需要使用 Future
但我可以在解析时从文本标记中排除定界符的值。
您的直觉是正确的,您需要使用 Forward
(而不是 Future
)来捕获文本的定义,因为这在解析时才完全可知。此外,您对 Word 的使用必须使用 excludeChars
参数排除分隔符 - 仅使用 Word(printables) + ~delimiter
是不够的。
这是您的代码,标记了必要的更改,并希望提供一些有用的评论:
delimiter = Word(printables, exact=1)
text = Forward() #(Word(printables) + ~delimiter)
def setTextExcludingDelimiter(s,l,t):
# define Word as all printable characters, excluding the delimiter character
# the excludeChars argument for Word is how this is done
text_word = Word(printables, excludeChars=t[0]).setName("text")
# use '<<' operator to assign the text_word definition to the
# previously defined text expression
text << text_word
# attach parse action to delimiter, so that once it is matched,
# it will define the correct expression for text
delimiter.setParseAction(setTextExcludingDelimiter)
# make the text expressions Optional with default value of '' to satisfy 3rd test case
parser = delimiter + Optional(text,'') + delimiter + Optional(text,'') + delimiter
我正在尝试解析以下内容:
<delimiter><text><delimiter><text><delimter>
其中 delimiter
可以是重复三次的任何单个文字字符,text
可以是分隔符 旁边的任何可打印字符( [ 的第一次和第二次出现=14=] do not 必须匹配并且可以为空)。
这是我想出的,但是 text
从第一个分隔符到字符串的末尾消耗。
from pyparsing import Word, printables
delimiter = Word(printables, exact=1)
text = (Word(printables) + ~delimiter)
parser = delimiter + text # + delimiter + text + delimiter
tests = [
('_abc_123_', ['_', 'abc', '_', '123', '_']),
('-abc-123-', ['-', 'abc', '-', '123', '-']),
('___', ['_', '', '_', '', '_']),
]
for test, expected in tests:
print parser.parseString(test), '<=>', expected
脚本输出:
['_', 'abc_123_'] <=> ['_', 'abc', '_', '123', '_']
['-', 'abc-123-'] <=> ['-', 'abc', '-', '123', '-']
['_', '__'] <=> ['_', '', '_', '', '_']
我想我需要使用 Future
但我可以在解析时从文本标记中排除定界符的值。
您的直觉是正确的,您需要使用 Forward
(而不是 Future
)来捕获文本的定义,因为这在解析时才完全可知。此外,您对 Word 的使用必须使用 excludeChars
参数排除分隔符 - 仅使用 Word(printables) + ~delimiter
是不够的。
这是您的代码,标记了必要的更改,并希望提供一些有用的评论:
delimiter = Word(printables, exact=1)
text = Forward() #(Word(printables) + ~delimiter)
def setTextExcludingDelimiter(s,l,t):
# define Word as all printable characters, excluding the delimiter character
# the excludeChars argument for Word is how this is done
text_word = Word(printables, excludeChars=t[0]).setName("text")
# use '<<' operator to assign the text_word definition to the
# previously defined text expression
text << text_word
# attach parse action to delimiter, so that once it is matched,
# it will define the correct expression for text
delimiter.setParseAction(setTextExcludingDelimiter)
# make the text expressions Optional with default value of '' to satisfy 3rd test case
parser = delimiter + Optional(text,'') + delimiter + Optional(text,'') + delimiter