Antlr4 解析器在 Python 3.7 中因错误放置的标记而过早结束
Antlr4 parser ends prematurely on misplaced token in Python 3.7
我遇到了一个问题,如果我的解析器发现它无法放置在任何规则中的标记,它会在没有明确报告错误的情况下结束,即使之后还有更多标记要放置。准确的说,token确实被识别了(我有一个规则,几乎是包罗万象的规则)但是token放错了地方,不能被任何规则覆盖。在这种情况下,我的解析器成功结束而没有报告任何错误(至少大声说出来)。
我看到的是这种情况:
要解析的代码:
.class public final Ld;
.super Ljava/lang/Object;
.source "java-style lambda group"
# interfaces
.implements Landroid/content/DialogInterface$OnClickListener;
<misplaced-tokens>
# static fields
.field public static final f:Ld;
.field public static final g:Ld;
...
(注意 <misplaced-tokens>
标记,实际上是五个标记 - 见下文。我希望解析在这里出错。)
已解析的标记:
[@0,0:5='.class',<'.class'>,1:0]
[@1,7:12='public',<'public'>,1:7]
[@2,14:18='final',<'final'>,1:14]
[@3,20:22='Ld;',<QUALIFIED_TYPE_NAME>,1:20]
[@4,24:29='.super',<'.super'>,2:0]
[@5,31:48='Ljava/lang/Object;',<QUALIFIED_TYPE_NAME>,2:7]
[@6,50:56='.source',<'.source'>,3:0]
[@7,58:82='"java-style lambda group"',<STRING_LITERAL>,3:8]
[@8,85:96='# interfaces',<LINE_COMMENT>,channel=1,5:0]
[@9,98:108='.implements',<'.implements'>,6:0]
[@10,110:158='Landroid/content/DialogInterface$OnClickListener;',<QUALIFIED_TYPE_NAME>,6:12]
[@11,160:160='<',<'<'>,7:0]
[@12,161:169='misplaced',<IDENTIFIER>,7:1]
[@13,170:170='-',<'-'>,7:10]
[@14,171:176='tokens',<IDENTIFIER>,7:11]
[@15,177:177='>',<'>'>,7:17]
[@16,180:194='# static fields',<LINE_COMMENT>,channel=1,9:0]
[@17,196:201='.field',<'.field'>,10:0]
...
解析进度:
enter parse, LT(1)=.class
enter statement, LT(1)=.class
enter classDirective, LT(1)=.class
consume [@0,0:5='.class',<30>,1:0] rule classDirective
enter classModifier, LT(1)=public
consume [@1,7:12='public',<53>,1:7] rule classModifier
exit classModifier, LT(1)=final
enter classModifier, LT(1)=final
consume [@2,14:18='final',<56>,1:14] rule classModifier
exit classModifier, LT(1)=Ld;
enter className, LT(1)=Ld;
enter referenceType, LT(1)=Ld;
consume [@3,20:22='Ld;',<1>,1:20] rule referenceType
exit referenceType, LT(1)=.super
exit className, LT(1)=.super
exit classDirective, LT(1)=.super
exit statement, LT(1)=.super
enter statement, LT(1)=.super
enter superDirective, LT(1)=.super
consume [@4,24:29='.super',<33>,2:0] rule superDirective
enter superName, LT(1)=Ljava/lang/Object;
enter referenceType, LT(1)=Ljava/lang/Object;
consume [@5,31:48='Ljava/lang/Object;',<1>,2:7] rule referenceType
exit referenceType, LT(1)=.source
exit superName, LT(1)=.source
exit superDirective, LT(1)=.source
exit statement, LT(1)=.source
enter statement, LT(1)=.source
enter sourceDirective, LT(1)=.source
consume [@6,50:56='.source',<32>,3:0] rule sourceDirective
enter sourceName, LT(1)="java-style lambda group"
enter stringLiteral, LT(1)="java-style lambda group"
consume [@7,58:82='"java-style lambda group"',<304>,3:8] rule stringLiteral
exit stringLiteral, LT(1)=.implements
exit sourceName, LT(1)=.implements
exit sourceDirective, LT(1)=.implements
exit statement, LT(1)=.implements
enter statement, LT(1)=.implements
enter implementsDirective, LT(1)=.implements
consume [@9,98:108='.implements',<31>,6:0] rule implementsDirective
enter implementsName, LT(1)=Landroid/content/DialogInterface$OnClickListener;
enter referenceType, LT(1)=Landroid/content/DialogInterface$OnClickListener;
consume [@10,110:158='Landroid/content/DialogInterface$OnClickListener;',<1>,6:12] rule referenceType
exit referenceType, LT(1)=<
exit implementsName, LT(1)=<
exit implementsDirective, LT(1)=<
exit statement, LT(1)=<
exit parse, LT(1)=<
(观察 parse 是如何成为主要规则并实际上在此处退出,即使管道中还有更多标记)
我尝试了什么:
我尝试重新实现默认的错误策略和错误侦听器,并将它们添加到词法分析器和解析器中,只是为了看看是否会命中任何断点。任何和所有重写方法都不会遇到断点(有时 reportAttemptingFullContext
除外)。
这是我添加覆盖的方式:
def parseFile(self, filePath):
errorListener = MyErrorListener()
strategy = MyErrorStrategy()
file = FileStream("file.smali")
lexer = SmaliLexer(file)
lexer.removeErrorListeners()
lexer.addErrorListener(errorListener)
lexer.addErrorListener(strategy)
stream = CommonTokenStream(lexer)
parser = SmaliParser(stream)
parser.removeErrorListeners()
parser.addErrorListener(errorListener)
parser.addErrorListener(strategy)
tree = parser.parse()
...
我的设置如下:
Windows 10 OS
Python 3.7
Antlr4 v4.8 - antlr-4.8-complete.jar
pip-installed runtime: antlr4_python3_runtime-4.8-py3-none-any.whl
对于如何使 Antlr4 真正考虑覆盖的侦听器和策略的任何帮助,我将非常感激,这样我既可以报告错误以进行调试,又可以以不同的方式处理它们。谢谢!
Antlr4 parser ends prematurely
当您调用的规则(在您的情况下为 parse
)未被 built-in EOF
令牌“锚定”时,可能会发生这种情况:
parse
: expression
;
expression
: expression '+' expression
| NUMBER
;
在上面的例子中,当输入是 1+2 3
.
时,生成的解析器将愉快地解析 1+2
如果您想强制解析器使用输入流中的所有标记,请将 EOF
添加到您的开始规则:
parse
: expression EOF
;
我遇到了一个问题,如果我的解析器发现它无法放置在任何规则中的标记,它会在没有明确报告错误的情况下结束,即使之后还有更多标记要放置。准确的说,token确实被识别了(我有一个规则,几乎是包罗万象的规则)但是token放错了地方,不能被任何规则覆盖。在这种情况下,我的解析器成功结束而没有报告任何错误(至少大声说出来)。
我看到的是这种情况: 要解析的代码:
.class public final Ld;
.super Ljava/lang/Object;
.source "java-style lambda group"
# interfaces
.implements Landroid/content/DialogInterface$OnClickListener;
<misplaced-tokens>
# static fields
.field public static final f:Ld;
.field public static final g:Ld;
...
(注意 <misplaced-tokens>
标记,实际上是五个标记 - 见下文。我希望解析在这里出错。)
已解析的标记:
[@0,0:5='.class',<'.class'>,1:0]
[@1,7:12='public',<'public'>,1:7]
[@2,14:18='final',<'final'>,1:14]
[@3,20:22='Ld;',<QUALIFIED_TYPE_NAME>,1:20]
[@4,24:29='.super',<'.super'>,2:0]
[@5,31:48='Ljava/lang/Object;',<QUALIFIED_TYPE_NAME>,2:7]
[@6,50:56='.source',<'.source'>,3:0]
[@7,58:82='"java-style lambda group"',<STRING_LITERAL>,3:8]
[@8,85:96='# interfaces',<LINE_COMMENT>,channel=1,5:0]
[@9,98:108='.implements',<'.implements'>,6:0]
[@10,110:158='Landroid/content/DialogInterface$OnClickListener;',<QUALIFIED_TYPE_NAME>,6:12]
[@11,160:160='<',<'<'>,7:0]
[@12,161:169='misplaced',<IDENTIFIER>,7:1]
[@13,170:170='-',<'-'>,7:10]
[@14,171:176='tokens',<IDENTIFIER>,7:11]
[@15,177:177='>',<'>'>,7:17]
[@16,180:194='# static fields',<LINE_COMMENT>,channel=1,9:0]
[@17,196:201='.field',<'.field'>,10:0]
...
解析进度:
enter parse, LT(1)=.class
enter statement, LT(1)=.class
enter classDirective, LT(1)=.class
consume [@0,0:5='.class',<30>,1:0] rule classDirective
enter classModifier, LT(1)=public
consume [@1,7:12='public',<53>,1:7] rule classModifier
exit classModifier, LT(1)=final
enter classModifier, LT(1)=final
consume [@2,14:18='final',<56>,1:14] rule classModifier
exit classModifier, LT(1)=Ld;
enter className, LT(1)=Ld;
enter referenceType, LT(1)=Ld;
consume [@3,20:22='Ld;',<1>,1:20] rule referenceType
exit referenceType, LT(1)=.super
exit className, LT(1)=.super
exit classDirective, LT(1)=.super
exit statement, LT(1)=.super
enter statement, LT(1)=.super
enter superDirective, LT(1)=.super
consume [@4,24:29='.super',<33>,2:0] rule superDirective
enter superName, LT(1)=Ljava/lang/Object;
enter referenceType, LT(1)=Ljava/lang/Object;
consume [@5,31:48='Ljava/lang/Object;',<1>,2:7] rule referenceType
exit referenceType, LT(1)=.source
exit superName, LT(1)=.source
exit superDirective, LT(1)=.source
exit statement, LT(1)=.source
enter statement, LT(1)=.source
enter sourceDirective, LT(1)=.source
consume [@6,50:56='.source',<32>,3:0] rule sourceDirective
enter sourceName, LT(1)="java-style lambda group"
enter stringLiteral, LT(1)="java-style lambda group"
consume [@7,58:82='"java-style lambda group"',<304>,3:8] rule stringLiteral
exit stringLiteral, LT(1)=.implements
exit sourceName, LT(1)=.implements
exit sourceDirective, LT(1)=.implements
exit statement, LT(1)=.implements
enter statement, LT(1)=.implements
enter implementsDirective, LT(1)=.implements
consume [@9,98:108='.implements',<31>,6:0] rule implementsDirective
enter implementsName, LT(1)=Landroid/content/DialogInterface$OnClickListener;
enter referenceType, LT(1)=Landroid/content/DialogInterface$OnClickListener;
consume [@10,110:158='Landroid/content/DialogInterface$OnClickListener;',<1>,6:12] rule referenceType
exit referenceType, LT(1)=<
exit implementsName, LT(1)=<
exit implementsDirective, LT(1)=<
exit statement, LT(1)=<
exit parse, LT(1)=<
(观察 parse 是如何成为主要规则并实际上在此处退出,即使管道中还有更多标记)
我尝试了什么:
我尝试重新实现默认的错误策略和错误侦听器,并将它们添加到词法分析器和解析器中,只是为了看看是否会命中任何断点。任何和所有重写方法都不会遇到断点(有时 reportAttemptingFullContext
除外)。
这是我添加覆盖的方式:
def parseFile(self, filePath):
errorListener = MyErrorListener()
strategy = MyErrorStrategy()
file = FileStream("file.smali")
lexer = SmaliLexer(file)
lexer.removeErrorListeners()
lexer.addErrorListener(errorListener)
lexer.addErrorListener(strategy)
stream = CommonTokenStream(lexer)
parser = SmaliParser(stream)
parser.removeErrorListeners()
parser.addErrorListener(errorListener)
parser.addErrorListener(strategy)
tree = parser.parse()
...
我的设置如下:
Windows 10 OS
Python 3.7
Antlr4 v4.8 - antlr-4.8-complete.jar
pip-installed runtime: antlr4_python3_runtime-4.8-py3-none-any.whl
对于如何使 Antlr4 真正考虑覆盖的侦听器和策略的任何帮助,我将非常感激,这样我既可以报告错误以进行调试,又可以以不同的方式处理它们。谢谢!
Antlr4 parser ends prematurely
当您调用的规则(在您的情况下为 parse
)未被 built-in EOF
令牌“锚定”时,可能会发生这种情况:
parse
: expression
;
expression
: expression '+' expression
| NUMBER
;
在上面的例子中,当输入是 1+2 3
.
1+2
如果您想强制解析器使用输入流中的所有标记,请将 EOF
添加到您的开始规则:
parse
: expression EOF
;