Pyparsing 发现比预期更多的匹配项
Pyparsing finding more matches than expected
我正在编写代码来解析基本计算机指令行。我的输入字符串是这样的 ADD(input1,input2) DEL(input3), SUB(input1,input2) INS(input3)
我期待这样的结果:
<line>
<instruction>
<type>ADD</type>
<args>
<ITEM>input1</ITEM>
<ITEM>input2</ITEM>
</args>
</instruction>
<instruction>
<type>DEL</type>
<args>
<ITEM>input3</ITEM>
</args>
</instruction>
</line>
<line>
<instruction>
<type>SUB</type>
<args>
<ITEM>input1</ITEM>
<ITEM>input2</ITEM>
</args>
</instruction>
<instruction>
<type>INS</type>
<args>
<ITEM>input3</ITEM>
</args>
</instruction>
</line>
我的实际结果具有我正在寻找的一般结构,但是行和指令解析器似乎在错误的位置匹配,或者标签可能出现在错误的位置。
实际结果:
<line>
<line>
<instruction>
<type>ADD</type>
<args>
<ITEM>input1</ITEM>
<ITEM>input2</ITEM>
</args>
</instruction>
<instruction>
<type>DEL</type>
<args>
<ITEM>input3</ITEM>
</args>
</instruction>
</line>
<instruction>
<instruction>
<type>SUB</type>
<args>
<ITEM>input1</ITEM>
<ITEM>input2</ITEM>
</args>
</instruction>
<instruction>
<type>INS</type>
<args>
<ITEM>input3</ITEM>
</args>
</instruction>
</instruction>
</line>
结果转储
[[['OTE', ['output1']]], [['XIO', ['input2']], ['OTE', ['output2']]]]
- branch: [[['OTE', ['output1']]], [['XIO', ['input2']], ['OTE', ['output2']]]]
[0]:
[['OTE', ['output1']]]
- instruction: ['OTE', ['output1']]
- args: ['output1']
- type: 'OTE'
[1]:
[['XIO', ['input2']], ['OTE', ['output2']]]
- instruction: ['OTE', ['output2']]
- args: ['output2']
- type: 'OTE'
出于某种原因,行匹配整个结构,第二行指令作为单个指令组匹配。我尝试在 instruction
行上使用 .setDebug()
函数,但我不确定如何解释输出。我不明白为什么最后 行 应该匹配为一条指令,因为它不遵循 Word(Word) 模式。
我的代码:
#!python3
from pyparsing import nestedExpr,alphas,Word,Literal,OneOrMore,alphanums,delimitedList,Group,Forward
theInput = r"ADD(input1,input2) DEL(input3), SUB(input1,input2) INS(input3)"
instructionType = Word(alphanums+"_")("type")
argument = Word(alphanums+"_[].")
arguments = Group(delimitedList(argument))("args")
instruction = Group(instructionType + Literal("(").suppress() + arguments + Literal(")").suppress())("instruction")
line = (delimitedList(Group(OneOrMore(instruction))))("line")
parsedInput = line.parseString(theInput).asXML()
print(parsedInput)
调试输出:
Match Group:({W:(ABCD...) Suppress:("(") Group:(W:(ABCD...) [, W:(ABCD...)]...) Suppress:(")")}) at loc 0(1,1)
Matched Group:({W:(ABCD...) Suppress:("(") Group:(W:(ABCD...) [, W:(ABCD...)]...) Suppress:(")")}) -> [['ADD', ['input1', 'input2']]]
Match Group:({W:(ABCD...) Suppress:("(") Group:(W:(ABCD...) [, W:(ABCD...)]...) Suppress:(")")}) at loc 18(1,19)
Matched Group:({W:(ABCD...) Suppress:("(") Group:(W:(ABCD...) [, W:(ABCD...)]...) Suppress:(")")}) -> [['DEL', ['input3']]]
Match Group:({W:(ABCD...) Suppress:("(") Group:(W:(ABCD...) [, W:(ABCD...)]...) Suppress:(")")}) at loc 30(1,31)
Exception raised:Expected W:(ABCD...) (at char 30), (line:1, col:31)
Match Group:({W:(ABCD...) Suppress:("(") Group:(W:(ABCD...) [, W:(ABCD...)]...) Suppress:(")")}) at loc 32(1,33)
Matched Group:({W:(ABCD...) Suppress:("(") Group:(W:(ABCD...) [, W:(ABCD...)]...) Suppress:(")")}) -> [['SUB', ['input1', 'input2']]]
Match Group:({W:(ABCD...) Suppress:("(") Group:(W:(ABCD...) [, W:(ABCD...)]...) Suppress:(")")}) at loc 50(1,51)
Matched Group:({W:(ABCD...) Suppress:("(") Group:(W:(ABCD...) [, W:(ABCD...)]...) Suppress:(")")}) -> [['INS', ['input3']]]
Match Group:({W:(ABCD...) Suppress:("(") Group:(W:(ABCD...) [, W:(ABCD...)]...) Suppress:(")")}) at loc 62(1,63)
Exception raised:Expected W:(ABCD...) (at char 62), (line:1, col:63)
我做错了什么?
您发布的代码的转储输出如下所示:
ADD(input1,input2) DEL(input3), SUB(input1,input2) INS(input3)
[[['ADD', ['input1', 'input2']], ['DEL', ['input3']]], [['SUB', ['input1', 'input2']], ['INS', ['input3']]]]
- line: [[['ADD', ['input1', 'input2']], ['DEL', ['input3']]], [['SUB', ['input1', 'input2']], ['INS', ['input3']]]]
[0]:
[['ADD', ['input1', 'input2']], ['DEL', ['input3']]]
- instruction: ['DEL', ['input3']]
- args: ['input3']
- type: 'DEL'
[1]:
[['SUB', ['input1', 'input2']], ['INS', ['input3']]]
- instruction: ['INS', ['input3']]
- args: ['input3']
- type: 'INS'
我们可以在 dump() 输出中看到所有指令都已解析,但只有每组中的最后一条指令显示在 "instruction" 名称下。发生这种情况是因为,就像 Python 字典一样,当多个值(如您可能在 ZeroOrMore 或 OneOrMore 中获得的值)被赋予相同的键时,只保留最后一个值。
有两种解决方法。一种是删除 ("instruction") 结果名称,这样您就可以在每个子列表中获得已解析的指令:
[[['ADD', ['input1', 'input2']], ['DEL', ['input3']]], [['SUB', ['input1', 'input2']], ['INS', ['input3']]]]
- line: [[['ADD', ['input1', 'input2']], ['DEL', ['input3']]], [['SUB', ['input1', 'input2']], ['INS', ['input3']]]]
[0]:
[['ADD', ['input1', 'input2']], ['DEL', ['input3']]]
[0]:
['ADD', ['input1', 'input2']]
- args: ['input1', 'input2']
- type: 'ADD'
[1]:
['DEL', ['input3']]
- args: ['input3']
- type: 'DEL'
[1]:
[['SUB', ['input1', 'input2']], ['INS', ['input3']]]
[0]:
['SUB', ['input1', 'input2']]
- args: ['input1', 'input2']
- type: 'SUB'
[1]:
['INS', ['input3']]
- args: ['input3']
- type: 'INS'
在 pyparsing 中,有时应该为给定名称保存多个值。 setResultsName()
方法有一个可选参数 listAllMatches
可以启用此行为。使用 setResultsName
的可调用快捷方式时,您不能传递 listAllMatches=True
- 相反,结果名称以“*”结尾:
instruction = Group(instructionType
+ Literal("(").suppress()
+ arguments
+ Literal(")").suppress())("instruction*")
这给出了这个输出:
[[['ADD', ['input1', 'input2']], ['DEL', ['input3']]], [['SUB', ['input1', 'input2']], ['INS', ['input3']]]]
- line: [[['ADD', ['input1', 'input2']], ['DEL', ['input3']]], [['SUB', ['input1', 'input2']], ['INS', ['input3']]]]
[0]:
[['ADD', ['input1', 'input2']], ['DEL', ['input3']]]
- instruction: [['ADD', ['input1', 'input2']], ['DEL', ['input3']]]
[0]:
['ADD', ['input1', 'input2']]
- args: ['input1', 'input2']
- type: 'ADD'
[1]:
['DEL', ['input3']]
- args: ['input3']
- type: 'DEL'
[1]:
[['SUB', ['input1', 'input2']], ['INS', ['input3']]]
- instruction: [['SUB', ['input1', 'input2']], ['INS', ['input3']]]
[0]:
['SUB', ['input1', 'input2']]
- args: ['input1', 'input2']
- type: 'SUB'
[1]:
['INS', ['input3']]
- args: ['input3']
- type: 'INS'
您可以选择您更喜欢的方法。
我正在编写代码来解析基本计算机指令行。我的输入字符串是这样的 ADD(input1,input2) DEL(input3), SUB(input1,input2) INS(input3)
我期待这样的结果:
<line>
<instruction>
<type>ADD</type>
<args>
<ITEM>input1</ITEM>
<ITEM>input2</ITEM>
</args>
</instruction>
<instruction>
<type>DEL</type>
<args>
<ITEM>input3</ITEM>
</args>
</instruction>
</line>
<line>
<instruction>
<type>SUB</type>
<args>
<ITEM>input1</ITEM>
<ITEM>input2</ITEM>
</args>
</instruction>
<instruction>
<type>INS</type>
<args>
<ITEM>input3</ITEM>
</args>
</instruction>
</line>
我的实际结果具有我正在寻找的一般结构,但是行和指令解析器似乎在错误的位置匹配,或者标签可能出现在错误的位置。
实际结果:
<line>
<line>
<instruction>
<type>ADD</type>
<args>
<ITEM>input1</ITEM>
<ITEM>input2</ITEM>
</args>
</instruction>
<instruction>
<type>DEL</type>
<args>
<ITEM>input3</ITEM>
</args>
</instruction>
</line>
<instruction>
<instruction>
<type>SUB</type>
<args>
<ITEM>input1</ITEM>
<ITEM>input2</ITEM>
</args>
</instruction>
<instruction>
<type>INS</type>
<args>
<ITEM>input3</ITEM>
</args>
</instruction>
</instruction>
</line>
结果转储
[[['OTE', ['output1']]], [['XIO', ['input2']], ['OTE', ['output2']]]]
- branch: [[['OTE', ['output1']]], [['XIO', ['input2']], ['OTE', ['output2']]]]
[0]:
[['OTE', ['output1']]]
- instruction: ['OTE', ['output1']]
- args: ['output1']
- type: 'OTE'
[1]:
[['XIO', ['input2']], ['OTE', ['output2']]]
- instruction: ['OTE', ['output2']]
- args: ['output2']
- type: 'OTE'
出于某种原因,行匹配整个结构,第二行指令作为单个指令组匹配。我尝试在 instruction
行上使用 .setDebug()
函数,但我不确定如何解释输出。我不明白为什么最后 行 应该匹配为一条指令,因为它不遵循 Word(Word) 模式。
我的代码:
#!python3
from pyparsing import nestedExpr,alphas,Word,Literal,OneOrMore,alphanums,delimitedList,Group,Forward
theInput = r"ADD(input1,input2) DEL(input3), SUB(input1,input2) INS(input3)"
instructionType = Word(alphanums+"_")("type")
argument = Word(alphanums+"_[].")
arguments = Group(delimitedList(argument))("args")
instruction = Group(instructionType + Literal("(").suppress() + arguments + Literal(")").suppress())("instruction")
line = (delimitedList(Group(OneOrMore(instruction))))("line")
parsedInput = line.parseString(theInput).asXML()
print(parsedInput)
调试输出:
Match Group:({W:(ABCD...) Suppress:("(") Group:(W:(ABCD...) [, W:(ABCD...)]...) Suppress:(")")}) at loc 0(1,1)
Matched Group:({W:(ABCD...) Suppress:("(") Group:(W:(ABCD...) [, W:(ABCD...)]...) Suppress:(")")}) -> [['ADD', ['input1', 'input2']]]
Match Group:({W:(ABCD...) Suppress:("(") Group:(W:(ABCD...) [, W:(ABCD...)]...) Suppress:(")")}) at loc 18(1,19)
Matched Group:({W:(ABCD...) Suppress:("(") Group:(W:(ABCD...) [, W:(ABCD...)]...) Suppress:(")")}) -> [['DEL', ['input3']]]
Match Group:({W:(ABCD...) Suppress:("(") Group:(W:(ABCD...) [, W:(ABCD...)]...) Suppress:(")")}) at loc 30(1,31)
Exception raised:Expected W:(ABCD...) (at char 30), (line:1, col:31)
Match Group:({W:(ABCD...) Suppress:("(") Group:(W:(ABCD...) [, W:(ABCD...)]...) Suppress:(")")}) at loc 32(1,33)
Matched Group:({W:(ABCD...) Suppress:("(") Group:(W:(ABCD...) [, W:(ABCD...)]...) Suppress:(")")}) -> [['SUB', ['input1', 'input2']]]
Match Group:({W:(ABCD...) Suppress:("(") Group:(W:(ABCD...) [, W:(ABCD...)]...) Suppress:(")")}) at loc 50(1,51)
Matched Group:({W:(ABCD...) Suppress:("(") Group:(W:(ABCD...) [, W:(ABCD...)]...) Suppress:(")")}) -> [['INS', ['input3']]]
Match Group:({W:(ABCD...) Suppress:("(") Group:(W:(ABCD...) [, W:(ABCD...)]...) Suppress:(")")}) at loc 62(1,63)
Exception raised:Expected W:(ABCD...) (at char 62), (line:1, col:63)
我做错了什么?
您发布的代码的转储输出如下所示:
ADD(input1,input2) DEL(input3), SUB(input1,input2) INS(input3)
[[['ADD', ['input1', 'input2']], ['DEL', ['input3']]], [['SUB', ['input1', 'input2']], ['INS', ['input3']]]]
- line: [[['ADD', ['input1', 'input2']], ['DEL', ['input3']]], [['SUB', ['input1', 'input2']], ['INS', ['input3']]]]
[0]:
[['ADD', ['input1', 'input2']], ['DEL', ['input3']]]
- instruction: ['DEL', ['input3']]
- args: ['input3']
- type: 'DEL'
[1]:
[['SUB', ['input1', 'input2']], ['INS', ['input3']]]
- instruction: ['INS', ['input3']]
- args: ['input3']
- type: 'INS'
我们可以在 dump() 输出中看到所有指令都已解析,但只有每组中的最后一条指令显示在 "instruction" 名称下。发生这种情况是因为,就像 Python 字典一样,当多个值(如您可能在 ZeroOrMore 或 OneOrMore 中获得的值)被赋予相同的键时,只保留最后一个值。
有两种解决方法。一种是删除 ("instruction") 结果名称,这样您就可以在每个子列表中获得已解析的指令:
[[['ADD', ['input1', 'input2']], ['DEL', ['input3']]], [['SUB', ['input1', 'input2']], ['INS', ['input3']]]]
- line: [[['ADD', ['input1', 'input2']], ['DEL', ['input3']]], [['SUB', ['input1', 'input2']], ['INS', ['input3']]]]
[0]:
[['ADD', ['input1', 'input2']], ['DEL', ['input3']]]
[0]:
['ADD', ['input1', 'input2']]
- args: ['input1', 'input2']
- type: 'ADD'
[1]:
['DEL', ['input3']]
- args: ['input3']
- type: 'DEL'
[1]:
[['SUB', ['input1', 'input2']], ['INS', ['input3']]]
[0]:
['SUB', ['input1', 'input2']]
- args: ['input1', 'input2']
- type: 'SUB'
[1]:
['INS', ['input3']]
- args: ['input3']
- type: 'INS'
在 pyparsing 中,有时应该为给定名称保存多个值。 setResultsName()
方法有一个可选参数 listAllMatches
可以启用此行为。使用 setResultsName
的可调用快捷方式时,您不能传递 listAllMatches=True
- 相反,结果名称以“*”结尾:
instruction = Group(instructionType
+ Literal("(").suppress()
+ arguments
+ Literal(")").suppress())("instruction*")
这给出了这个输出:
[[['ADD', ['input1', 'input2']], ['DEL', ['input3']]], [['SUB', ['input1', 'input2']], ['INS', ['input3']]]]
- line: [[['ADD', ['input1', 'input2']], ['DEL', ['input3']]], [['SUB', ['input1', 'input2']], ['INS', ['input3']]]]
[0]:
[['ADD', ['input1', 'input2']], ['DEL', ['input3']]]
- instruction: [['ADD', ['input1', 'input2']], ['DEL', ['input3']]]
[0]:
['ADD', ['input1', 'input2']]
- args: ['input1', 'input2']
- type: 'ADD'
[1]:
['DEL', ['input3']]
- args: ['input3']
- type: 'DEL'
[1]:
[['SUB', ['input1', 'input2']], ['INS', ['input3']]]
- instruction: [['SUB', ['input1', 'input2']], ['INS', ['input3']]]
[0]:
['SUB', ['input1', 'input2']]
- args: ['input1', 'input2']
- type: 'SUB'
[1]:
['INS', ['input3']]
- args: ['input3']
- type: 'INS'
您可以选择您更喜欢的方法。