Pyparsing Forward() 语法递归
Pyparsing Forward() Grammar Recursion
我正在使用 Pyparsing 来解析一个日志文件,其中包含如下所示的块:
keyName0: foo
keyName1: bar
msgKey [Read]: 21 FA 00 34
msgKey [Read]:
MESSAGE 1 of 2
keyName0: keyValue0
keyName1: keyValue1
Flags1: No Flags Set
Flags1: 0
Flags2: No Flags Set
Flags2: 0
keyName6: AB34CD56EF (123456789)
keyName7: 7
keyName8: 7
Data [Read]: 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15
20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
36 37 38
msgKey [Read]: 01 02 03 04
msgKey [Read]:
MESSAGE 2 of 2
# same structure as message above
keyName3: keyValue3
keyName4 [IN]: keyValue4 (123 IN)
keyName4 [OUT]: keyValue4 (123 OUT)
我为 keyName-Value-lines 写了一个语法:
key_line = lineEnd + OneOrMore(Word(printables_no_column)).setParseAction(' '.join).setResultsName('keyName') + Suppress(':') \
+ OneOrMore(Word(printables_no_column), stopOn=lineEnd).setParseAction(' '.join).setResultsName('keyValue')
此语法适用于单行。现在我试着用这个语法来描述整个测试数据的语法:
message = Forward()
key_line = lineEnd + OneOrMore(Word(printables_no_column)).setParseAction(' '.join).setResultsName('keyName') + Suppress(':') \
+ MatchFirst(message, OneOrMore(Word(printables_no_column),stopOn=lineEnd).setParseAction(' '.join).setResultsName('keyValue'))
key_lines = ZeroOrMore(Group(key_line)).setResultsName('keys')
message << Literal('MESSAGE') + number + Literal('of')
+ number.setResultsName('totalMsgs') + key_lines
但是,我认为这个语法以无限递归结束。我需要帮助来弄清楚如何正确使用 Forward() 递归语法。非常感谢!
这应该会让你前进一点。仍然可能需要在整体上获得更好的结构,但我认为基本的部分都在这里。查看嵌入的评论:
import pyparsing as pp
# your original expression - x.setResultName("x") can now be written just x("x")
# key_line = (lineEnd
# + OneOrMore(Word(printables_no_column)).setParseAction(' '.join)('keyName')
# + Suppress(':')
# + OneOrMore(Word(printables_no_column), stopOn=lineEnd).setParseAction(' '.join)('keyValue'))
# literals in your grammar will be suppressed by default
pp.ParserElement.inlineLiteralsUsing(pp.Suppress)
integer = pp.pyparsing_common.integer
hex_byte = pp.Word(pp.hexnums, exact=2)
# read everything up to ':' - a little risky to define a Word including spaces, may want to revisit and
# explicitly parse bits, to detect "[IN]" vs "[OUT]", etc.
key_name_expr = pp.Word(pp.printables + " ", excludeChars=':')
key_line = pp.Group(key_name_expr("key_name") + ':'
+ ~pp.lineEnd() # make sure key value is on this same line
+ pp.empty() # handy trick to advance past white space
+ pp.restOfLine()('key_value'))
# special key_line to read data bytes
data_body = "Data [Read]:" + pp.OneOrMore(hex_byte)
msg_body = ("msgKey [Read]:" + pp.lineEnd()
+ "MESSAGE" + integer("message_num") + "of" + integer("total_msgs")
+ pp.OneOrMore(pp.Group(key_line)("params*"), stopOn=data_body)
+ data_body("data"))
msg_expr = (pp.OneOrMore(pp.LineStart() + pp.Group(key_line)("params*"), stopOn=msg_body)
+ pp.Optional(pp.Group(msg_body)("body")))
使用 searchString 查找匹配的块,并将它们转储出来:
for match in msg_expr.searchString(source):
print(match.dump())
# some sample code showing how to access parsed data fields
if match.body:
print("Msg {message_num}/{total_msgs}".format_map(match.body))
print(match.body.data)
print()
印刷品(显示摘录):
[[['keyName1', '2']], [['msgKey [Read]', '21 FA 00 34']], ['\n', 1, 2, [['keyName0', 'keyValue0']], [['keyName1', 'keyValue1']], [['Flags1', 'No Flags Set']], [['Flags1', '0']], [['Flags2', 'No Flags Set']], [['Flags2', '0']], [['keyName6', 'AB34CD56EF (123456789)']], [['keyName7', '7']], [['keyName8', '7']], '00', '01', '02', '03', '04', '05', '06', '07', '08', '09', '10', '11', '12', '13', '14', '15', '20', '21', '22', '23', '24', '25', '26', '27', '28', '29', '30', '31', '32', '33', '34', '35', '36', '37', '38']]
- body: ['\n', 1, 2, [['keyName0', 'keyValue0']], [['keyName1', 'keyValue1']], [['Flags1', 'No Flags Set']], [['Flags1', '0']], [['Flags2', 'No Flags Set']], [['Flags2', '0']], [['keyName6', 'AB34CD56EF (123456789)']], [['keyName7', '7']], [['keyName8', '7']], '00', '01', '02', '03', '04', '05', '06', '07', '08', '09', '10', '11', '12', '13', '14', '15', '20', '21', '22', '23', '24', '25', '26', '27', '28', '29', '30', '31', '32', '33', '34', '35', '36', '37', '38']
- data: ['00', '01', '02', '03', '04', '05', '06', '07', '08', '09', '10', '11', '12', '13', '14', '15', '20', '21', '22', '23', '24', '25', '26', '27', '28', '29', '30', '31', '32', '33', '34', '35', '36', '37', '38']
- message_num: 1
- params: [[['keyName0', 'keyValue0']], [['keyName1', 'keyValue1']], [['Flags1', 'No Flags Set']], [['Flags1', '0']], [['Flags2', 'No Flags Set']], [['Flags2', '0']], [['keyName6', 'AB34CD56EF (123456789)']], [['keyName7', '7']], [['keyName8', '7']]]
[0]:
[['keyName0', 'keyValue0']]
[0]:
['keyName0', 'keyValue0']
- key_name: 'keyName0'
- key_value: 'keyValue0'
[1]:
[['keyName1', 'keyValue1']]
[0]:
['keyName1', 'keyValue1']
- key_name: 'keyName1'
- key_value: 'keyValue1'
[2]:
[['Flags1', 'No Flags Set']]
[0]:
['Flags1', 'No Flags Set']
- key_name: 'Flags1'
- key_value: 'No Flags Set'
[3]:
[['Flags1', '0']]
[0]:
['Flags1', '0']
- key_name: 'Flags1'
- key_value: '0'
[4]:
[['Flags2', 'No Flags Set']]
[0]:
['Flags2', 'No Flags Set']
- key_name: 'Flags2'
- key_value: 'No Flags Set'
[5]:
[['Flags2', '0']]
[0]:
['Flags2', '0']
- key_name: 'Flags2'
- key_value: '0'
[6]:
[['keyName6', 'AB34CD56EF (123456789)']]
[0]:
['keyName6', 'AB34CD56EF (123456789)']
- key_name: 'keyName6'
- key_value: 'AB34CD56EF (123456789)'
[7]:
[['keyName7', '7']]
[0]:
['keyName7', '7']
- key_name: 'keyName7'
- key_value: '7'
[8]:
[['keyName8', '7']]
[0]:
['keyName8', '7']
- key_name: 'keyName8'
- key_value: '7'
- total_msgs: 2
- params: [[['keyName1', '2']], [['msgKey [Read]', '21 FA 00 34']]]
[0]:
[['keyName1', '2']]
[0]:
['keyName1', '2']
- key_name: 'keyName1'
- key_value: '2'
[1]:
[['msgKey [Read]', '21 FA 00 34']]
[0]:
['msgKey [Read]', '21 FA 00 34']
- key_name: 'msgKey [Read]'
- key_value: '21 FA 00 34'
Msg 1/2
['00', '01', '02', '03', '04', '05', '06', '07', '08', '09', '10', '11', '12', '13', '14', '15', '20', '21', '22', '23', '24', '25', '26', '27', '28', '29', '30', '31', '32', '33', '34', '35', '36', '37', '38']
我正在使用 Pyparsing 来解析一个日志文件,其中包含如下所示的块:
keyName0: foo
keyName1: bar
msgKey [Read]: 21 FA 00 34
msgKey [Read]:
MESSAGE 1 of 2
keyName0: keyValue0
keyName1: keyValue1
Flags1: No Flags Set
Flags1: 0
Flags2: No Flags Set
Flags2: 0
keyName6: AB34CD56EF (123456789)
keyName7: 7
keyName8: 7
Data [Read]: 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15
20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
36 37 38
msgKey [Read]: 01 02 03 04
msgKey [Read]:
MESSAGE 2 of 2
# same structure as message above
keyName3: keyValue3
keyName4 [IN]: keyValue4 (123 IN)
keyName4 [OUT]: keyValue4 (123 OUT)
我为 keyName-Value-lines 写了一个语法:
key_line = lineEnd + OneOrMore(Word(printables_no_column)).setParseAction(' '.join).setResultsName('keyName') + Suppress(':') \
+ OneOrMore(Word(printables_no_column), stopOn=lineEnd).setParseAction(' '.join).setResultsName('keyValue')
此语法适用于单行。现在我试着用这个语法来描述整个测试数据的语法:
message = Forward()
key_line = lineEnd + OneOrMore(Word(printables_no_column)).setParseAction(' '.join).setResultsName('keyName') + Suppress(':') \
+ MatchFirst(message, OneOrMore(Word(printables_no_column),stopOn=lineEnd).setParseAction(' '.join).setResultsName('keyValue'))
key_lines = ZeroOrMore(Group(key_line)).setResultsName('keys')
message << Literal('MESSAGE') + number + Literal('of')
+ number.setResultsName('totalMsgs') + key_lines
但是,我认为这个语法以无限递归结束。我需要帮助来弄清楚如何正确使用 Forward() 递归语法。非常感谢!
这应该会让你前进一点。仍然可能需要在整体上获得更好的结构,但我认为基本的部分都在这里。查看嵌入的评论:
import pyparsing as pp
# your original expression - x.setResultName("x") can now be written just x("x")
# key_line = (lineEnd
# + OneOrMore(Word(printables_no_column)).setParseAction(' '.join)('keyName')
# + Suppress(':')
# + OneOrMore(Word(printables_no_column), stopOn=lineEnd).setParseAction(' '.join)('keyValue'))
# literals in your grammar will be suppressed by default
pp.ParserElement.inlineLiteralsUsing(pp.Suppress)
integer = pp.pyparsing_common.integer
hex_byte = pp.Word(pp.hexnums, exact=2)
# read everything up to ':' - a little risky to define a Word including spaces, may want to revisit and
# explicitly parse bits, to detect "[IN]" vs "[OUT]", etc.
key_name_expr = pp.Word(pp.printables + " ", excludeChars=':')
key_line = pp.Group(key_name_expr("key_name") + ':'
+ ~pp.lineEnd() # make sure key value is on this same line
+ pp.empty() # handy trick to advance past white space
+ pp.restOfLine()('key_value'))
# special key_line to read data bytes
data_body = "Data [Read]:" + pp.OneOrMore(hex_byte)
msg_body = ("msgKey [Read]:" + pp.lineEnd()
+ "MESSAGE" + integer("message_num") + "of" + integer("total_msgs")
+ pp.OneOrMore(pp.Group(key_line)("params*"), stopOn=data_body)
+ data_body("data"))
msg_expr = (pp.OneOrMore(pp.LineStart() + pp.Group(key_line)("params*"), stopOn=msg_body)
+ pp.Optional(pp.Group(msg_body)("body")))
使用 searchString 查找匹配的块,并将它们转储出来:
for match in msg_expr.searchString(source):
print(match.dump())
# some sample code showing how to access parsed data fields
if match.body:
print("Msg {message_num}/{total_msgs}".format_map(match.body))
print(match.body.data)
print()
印刷品(显示摘录):
[[['keyName1', '2']], [['msgKey [Read]', '21 FA 00 34']], ['\n', 1, 2, [['keyName0', 'keyValue0']], [['keyName1', 'keyValue1']], [['Flags1', 'No Flags Set']], [['Flags1', '0']], [['Flags2', 'No Flags Set']], [['Flags2', '0']], [['keyName6', 'AB34CD56EF (123456789)']], [['keyName7', '7']], [['keyName8', '7']], '00', '01', '02', '03', '04', '05', '06', '07', '08', '09', '10', '11', '12', '13', '14', '15', '20', '21', '22', '23', '24', '25', '26', '27', '28', '29', '30', '31', '32', '33', '34', '35', '36', '37', '38']]
- body: ['\n', 1, 2, [['keyName0', 'keyValue0']], [['keyName1', 'keyValue1']], [['Flags1', 'No Flags Set']], [['Flags1', '0']], [['Flags2', 'No Flags Set']], [['Flags2', '0']], [['keyName6', 'AB34CD56EF (123456789)']], [['keyName7', '7']], [['keyName8', '7']], '00', '01', '02', '03', '04', '05', '06', '07', '08', '09', '10', '11', '12', '13', '14', '15', '20', '21', '22', '23', '24', '25', '26', '27', '28', '29', '30', '31', '32', '33', '34', '35', '36', '37', '38']
- data: ['00', '01', '02', '03', '04', '05', '06', '07', '08', '09', '10', '11', '12', '13', '14', '15', '20', '21', '22', '23', '24', '25', '26', '27', '28', '29', '30', '31', '32', '33', '34', '35', '36', '37', '38']
- message_num: 1
- params: [[['keyName0', 'keyValue0']], [['keyName1', 'keyValue1']], [['Flags1', 'No Flags Set']], [['Flags1', '0']], [['Flags2', 'No Flags Set']], [['Flags2', '0']], [['keyName6', 'AB34CD56EF (123456789)']], [['keyName7', '7']], [['keyName8', '7']]]
[0]:
[['keyName0', 'keyValue0']]
[0]:
['keyName0', 'keyValue0']
- key_name: 'keyName0'
- key_value: 'keyValue0'
[1]:
[['keyName1', 'keyValue1']]
[0]:
['keyName1', 'keyValue1']
- key_name: 'keyName1'
- key_value: 'keyValue1'
[2]:
[['Flags1', 'No Flags Set']]
[0]:
['Flags1', 'No Flags Set']
- key_name: 'Flags1'
- key_value: 'No Flags Set'
[3]:
[['Flags1', '0']]
[0]:
['Flags1', '0']
- key_name: 'Flags1'
- key_value: '0'
[4]:
[['Flags2', 'No Flags Set']]
[0]:
['Flags2', 'No Flags Set']
- key_name: 'Flags2'
- key_value: 'No Flags Set'
[5]:
[['Flags2', '0']]
[0]:
['Flags2', '0']
- key_name: 'Flags2'
- key_value: '0'
[6]:
[['keyName6', 'AB34CD56EF (123456789)']]
[0]:
['keyName6', 'AB34CD56EF (123456789)']
- key_name: 'keyName6'
- key_value: 'AB34CD56EF (123456789)'
[7]:
[['keyName7', '7']]
[0]:
['keyName7', '7']
- key_name: 'keyName7'
- key_value: '7'
[8]:
[['keyName8', '7']]
[0]:
['keyName8', '7']
- key_name: 'keyName8'
- key_value: '7'
- total_msgs: 2
- params: [[['keyName1', '2']], [['msgKey [Read]', '21 FA 00 34']]]
[0]:
[['keyName1', '2']]
[0]:
['keyName1', '2']
- key_name: 'keyName1'
- key_value: '2'
[1]:
[['msgKey [Read]', '21 FA 00 34']]
[0]:
['msgKey [Read]', '21 FA 00 34']
- key_name: 'msgKey [Read]'
- key_value: '21 FA 00 34'
Msg 1/2
['00', '01', '02', '03', '04', '05', '06', '07', '08', '09', '10', '11', '12', '13', '14', '15', '20', '21', '22', '23', '24', '25', '26', '27', '28', '29', '30', '31', '32', '33', '34', '35', '36', '37', '38']