Antlr 语义谓词未能找到可行的替代方案
Antlr semantic predicate failed to find viable alternative
我无法获得更简单的语义谓词来与适用于 .net Framework 4.8 的 Antlr 4.6.6 一起工作
下面的语法找不到可行的输入替代方案
"received:last week"
.
grammar test;
// Parser rules
parse
: expr (expr)* EOF
;
expr
: {false}? received ':' lastweek
| received ':' text
| text
;
received: RECEIVED;
lastWeek: LASTWEEK;
text: TEXT;
RECEIVED: 'received';
TEXT
:
~(' ' | ':')+
;
LASTWEEK: 'last week';
SPACES: [ \t\r\n] -> skip;
更新:
这是我的问题的简化。是否有可能有一个语法可以将这个“收到:上周”解析为“收到”“上周”只有当“上周”之前是“收到”但是例如我有“主题:上周”被解析为“主题”“最后”“周”。
当我运行这段代码时:
public static void main(String[] args) {
String source = "received:last week";
testLexer lexer = new testLexer(CharStreams.fromString(source));
testParser parser = new testParser(new CommonTokenStream(lexer));
System.out.println(parser.parse().toStringTree(parser));
}
错误 line 1:0 no viable alternative at input 'received'
打印到 STDERR。当我将 {false}?
更改为 {true}?
时,输入被正确解析(如预期)。
如果由于 {false}?
谓词,您期望输入被解析为 received ':' text
,那么您误解了 ANTLR 的词法分析器的工作原理。词法分析器独立于解析器生成标记。解析器尝试匹配 TEXT
标记并不重要,您的输入始终以相同的方式标记化。
词法分析器是这样工作的:
- 尽量消耗尽可能多的字符
- 如果有两个或多个词法分析器规则匹配相同的字符,让第一个定义的“赢”
根据这些规则,很明显 "received:last week"
被标记为 RECEIVED
、':'
和 LASTWEEK
标记。
编辑
Is it possible to have a grammar that can parse this "received:last week" as "received" "last week" only if the "last week" is preceded by "received" but if for example I have "subject:last week" to be parsed as "subject" "last" "week"
您可以使用 lexical modes 使词法分析器对上下文敏感。然后,您必须创建单独的词法分析器和解析器语法,它们可能如下所示:
TestLexer.g4
lexer grammar TestLexer;
RECEIVED : 'received' -> pushMode(RECEIVED_MODE);
SUBJECT : 'subject';
TEXT : ~[ :]+;
COLON : ':';
SPACES : SPACE+ -> skip;
fragment SPACE : [ \t\r\n];
mode RECEIVED_MODE;
LASTWEEK : 'last' SPACE+ 'week' -> popMode;
RECEIVED_MODE_COLON : ':' -> type(COLON);
RECEIVED_MODE_TEXT : ~[ :]+ -> type(TEXT), popMode;
您可以在解析器语法中像这样使用上面的词法分析器:
TestParser.g4
parser grammar TestParser;
options {
tokenVocab=TestLexer;
}
...
现在 "received:last week"
将被标记为:
'received' `received`
COLON `:`
LASTWEEK `last week`
EOF `<EOF>`
和 "subject:last week"
将被标记为:
'subject' `subject`
COLON `:`
TEXT `last`
TEXT `week`
EOF `<EOF>`
编辑二
您也可以像这样将 last week
的创建移动到解析器中:
received
: RECEIVED ':' last_week
;
subject
: SUBJECT ':' text
;
last_week
: LAST WEEK
;
text
: TEXT
| LAST
| WEEK
;
RECEIVED : 'received';
SUBJECT : 'subject';
LAST : 'last';
WEEK : 'week';
TEXT : ~[ :]+;
我无法获得更简单的语义谓词来与适用于 .net Framework 4.8 的 Antlr 4.6.6 一起工作 下面的语法找不到可行的输入替代方案
"received:last week"
.
grammar test;
// Parser rules
parse
: expr (expr)* EOF
;
expr
: {false}? received ':' lastweek
| received ':' text
| text
;
received: RECEIVED;
lastWeek: LASTWEEK;
text: TEXT;
RECEIVED: 'received';
TEXT
:
~(' ' | ':')+
;
LASTWEEK: 'last week';
SPACES: [ \t\r\n] -> skip;
更新: 这是我的问题的简化。是否有可能有一个语法可以将这个“收到:上周”解析为“收到”“上周”只有当“上周”之前是“收到”但是例如我有“主题:上周”被解析为“主题”“最后”“周”。
当我运行这段代码时:
public static void main(String[] args) {
String source = "received:last week";
testLexer lexer = new testLexer(CharStreams.fromString(source));
testParser parser = new testParser(new CommonTokenStream(lexer));
System.out.println(parser.parse().toStringTree(parser));
}
错误 line 1:0 no viable alternative at input 'received'
打印到 STDERR。当我将 {false}?
更改为 {true}?
时,输入被正确解析(如预期)。
如果由于 {false}?
谓词,您期望输入被解析为 received ':' text
,那么您误解了 ANTLR 的词法分析器的工作原理。词法分析器独立于解析器生成标记。解析器尝试匹配 TEXT
标记并不重要,您的输入始终以相同的方式标记化。
词法分析器是这样工作的:
- 尽量消耗尽可能多的字符
- 如果有两个或多个词法分析器规则匹配相同的字符,让第一个定义的“赢”
根据这些规则,很明显 "received:last week"
被标记为 RECEIVED
、':'
和 LASTWEEK
标记。
编辑
Is it possible to have a grammar that can parse this "received:last week" as "received" "last week" only if the "last week" is preceded by "received" but if for example I have "subject:last week" to be parsed as "subject" "last" "week"
您可以使用 lexical modes 使词法分析器对上下文敏感。然后,您必须创建单独的词法分析器和解析器语法,它们可能如下所示:
TestLexer.g4
lexer grammar TestLexer;
RECEIVED : 'received' -> pushMode(RECEIVED_MODE);
SUBJECT : 'subject';
TEXT : ~[ :]+;
COLON : ':';
SPACES : SPACE+ -> skip;
fragment SPACE : [ \t\r\n];
mode RECEIVED_MODE;
LASTWEEK : 'last' SPACE+ 'week' -> popMode;
RECEIVED_MODE_COLON : ':' -> type(COLON);
RECEIVED_MODE_TEXT : ~[ :]+ -> type(TEXT), popMode;
您可以在解析器语法中像这样使用上面的词法分析器:
TestParser.g4
parser grammar TestParser;
options {
tokenVocab=TestLexer;
}
...
现在 "received:last week"
将被标记为:
'received' `received`
COLON `:`
LASTWEEK `last week`
EOF `<EOF>`
和 "subject:last week"
将被标记为:
'subject' `subject`
COLON `:`
TEXT `last`
TEXT `week`
EOF `<EOF>`
编辑二
您也可以像这样将 last week
的创建移动到解析器中:
received
: RECEIVED ':' last_week
;
subject
: SUBJECT ':' text
;
last_week
: LAST WEEK
;
text
: TEXT
| LAST
| WEEK
;
RECEIVED : 'received';
SUBJECT : 'subject';
LAST : 'last';
WEEK : 'week';
TEXT : ~[ :]+;