元素可以包含由 ANTLR 生成的解析器解析的属性吗?如果是这样,如何?
can an element contain attribute as parsed by parser generated by ANTLR? if so, how?
我正在关注 this tutorial 并成功复制了它的行为,除了我使用的是 Antlr 4.7 而不是教程使用的 4.5。
我正在尝试为费用跟踪器构建 DSL。
想知道是否每个元素都可以有属性?
例如这是现在的样子
这是在 https://github.com/simkimsia/learn-antlr-web-js/blob/master/todo.g4
中看到的 todo.g4 的代码
grammar todo;
elements
: (element|emptyLine)* EOF
;
element
: '*' ( ' ' | '\t' )* CONTENT NL+
;
emptyLine
: NL
;
NL
: '\r' | '\n'
;
CONTENT
: [a-zA-Z0-9_][a-zA-Z0-9_ \t]*
;
意思是说该元素还将具有 amount 和 payee 等 2 个属性。为了简单起见,我将使用相同的句子结构,以便更容易地进行解析。
格式将为 pay [payee] [amount]
例子是pay Acme Corp 123,789.45
因此收款人是 Acme Corp,金额为 12378945,以整数表示以美分计的金额
另一个例子是pay Banana Inc 700
因此收款人是 Banana Inc,金额为 70000,以整数表示以美分计的金额
我猜我需要更改 todo.g4 然后重新生成解析器。
元素可以有其他属性吗?
如果是这样,我该如何开始?
更新
这是我最近的尝试,最新更新排在最前面:
我刚刚弄清楚如何使用 grun 和 testRig。感谢@Raven 的提示。
最新尝试:我最新的 expense.g4(与之前尝试的唯一区别是付款的正则表达式)
grammar expense;
payments: (payment NL)* ;
payment: PAY receiver amount=NUMBER ;
receiver: surname=ID (lastname=ID)? ;
PAY: 'pay' ;
NUMBER: ([0-9]+(','[0-9]+)*)('.'[0-9]*)?;
ID: [a-zA-Z0-9_]+ ;
NL: '\n' | '\r\n' ;
WS: [\t ]+ -> skip ;
较早的尝试:这是我的费用。g4
grammar expense;
payments: (payment NL)* ;
payment: PAY receiver amount=NUMBER ;
receiver: surname=ID (lastname=ID)? ;
PAY: 'pay' ;
NUMBER: [0-9]+ (',' [0-9]+)+ ('.' [0-9]+)? ;
ID: [a-zA-Z0-9_]+ ;
NL: '\n' | '\r\n' ;
WS: [\t ]+ -> skip ;
我不完全确定你到底想要什么,但对于提供的示例,这个语法应该可以完成工作:
payments: (payment NL)* ;
payment: PAY receiver amount=NUMBER ;
receiver: surname=ID (lastname=ID)? ;
PAY: 'pay' ;
NUMBER: [0-9]+ (',' [0-9]+)+ ('.' [0-9]+)? ;
ID: [a-zA-Z0-9_]+ ;
NL: '\n' | '\r\n' ;
WS: [\t ]+ -> skip ;
如果这是您的要求,我会根据需要添加更多解释...
I am guessing I need to change the todo.g4 and then re generate the
parser.
当然每次更改后都会重新生成。对我来说是:
$ a4 Question.g4
$ javac Q*.java
$ grun Question elements -tokens -diagnostics t.text
哪里
$ alias
alias a4='java -jar /usr/local/lib/antlr-4.6-complete.jar'
alias grun='java org.antlr.v4.gui.TestRig'
具体内容描述的越多,越容易遇到歧义问题。例如,您有两个规则:
payment : 'pay' [payee] [amount]
free_text : ... any character ...
考虑以下内容:
* pay Federico Tomassetti 10 € for the tutorial
* pay Federico Tomassetti 10
是有歧义的,可以被两条规则匹配,但最终会被解析为自由文本,因为€ for the tutorial
不满足payment
。
如果您稍后更改 payment
规则以在金额后接受更多信息:
payment : 'pay' [payee] [amount] payment_info
以上内容将被payment
匹配(如果有歧义ANTLR选择第一个规则)。好消息是 ANTLR 4 消除歧义的能力很强,必要时它会读取整个文件。
模棱两可的token和precedence rules,看了这三周的帖子,说了很多
将 Raven 的语法与您的语法混合,这是一种可能的解决方案:
文件Question.g4
grammar Question;
elements
@init {System.out.println("Question last update 1432");}
: ( element | emptyLine )* EOF
;
element
: '*' content NL
;
content
: payment //{System.out.println("Payement found " + $payment.text);}
| free_text {System.out.println("Free text found " + $free_text.text);}
;
payment
: PAY receiver amount=NUMBER
{System.out.println("Payement found " + $amount.text + " to " + $receiver.text);}
;
receiver
: surname=WORD ( lastname=WORD )?
;
free_text
: ( WORD | PAY | NUMBER )+
;
emptyLine
: NL
;
PAY : 'pay' ;
WORD : LETTER ( LETTER | DIGIT | '_' )* ;
NUMBER : DIGIT+ ( ',' DIGIT+ )? ( '.' DIGIT+ )? ;
NL : [\r\n]
| '\r\n'
;
//WS : [ \t]+ -> skip ; // $payment.text => payAcmeCorp123,789.45
WS : [ \t]+ -> channel(HIDDEN) ; // spaces are needed to nicely display $payment.text
fragment DIGIT : [0-9] ;
fragment LETTER : [a-zA-Z] ;
文件t.text
* play with ANTLR 4
* write a tutorial
* pay Acme Corp 123,789.45
* pay Banana Inc 700
* pay Federico Tomassetti 10 € for the tutorial
执行:
$ grun Question elements -tokens -diagnostics t.text
line 5:29 token recognition error at: '€'
[@0,0:0='*',<'*'>,1:0]
[@1,1:1=' ',<WS>,channel=1,1:1]
[@2,2:5='play',<WORD>,1:2]
[@3,6:6=' ',<WS>,channel=1,1:6]
[@4,7:10='with',<WORD>,1:7]
[@5,11:11=' ',<WS>,channel=1,1:11]
[@6,12:16='ANTLR',<WORD>,1:12]
[@7,17:17=' ',<WS>,channel=1,1:17]
[@8,18:18='4',<NUMBER>,1:18]
[@9,19:19='\n',<NL>,1:19]
[@10,20:20='*',<'*'>,2:0]
[@11,21:21=' ',<WS>,channel=1,2:1]
[@12,22:26='write',<WORD>,2:2]
[@13,27:27=' ',<WS>,channel=1,2:7]
[@14,28:28='a',<WORD>,2:8]
[@15,29:29=' ',<WS>,channel=1,2:9]
[@16,30:37='tutorial',<WORD>,2:10]
[@17,38:38='\n',<NL>,2:18]
...
[@56,136:135='<EOF>',<EOF>,7:0]
Question last update 1432
Free text found play with ANTLR 4
Free text found write a tutorial
line 3:26 reportAttemptingFullContext d=2 (content), input='pay Acme Corp 123,789.45
'
...
Payement found 700 to Banana Inc
Free text found pay Federico Tomassetti 10 for the tutorial
如您所见,无法识别 € 符号。您可能需要一个类似于 FIELDTEXT
的 CONTENT
规则,然后您就会遇到麻烦 ...
费德里科 Mega tutorial is a good start. For nitty-gritty details, see The Definitive ANTLR 4 Reference or the online doc from www.antlr.org.
2017 年 10 月 24 日 19:00 UTC+1 的情况。
你的语法很完美。我在 Java.
中进行了全面测试
文件Expense.g4
:
grammar Expense;
payments
@init {System.out.println("Expense last update 1853");}
: (payment NL)*
;
payment
: PAY receiver amount=NUMBER
{System.out.println("Payement found " + $amount.text + " to " + $receiver.text);}
;
receiver
: surname=ID (lastname=ID)?
;
PAY : 'pay' ;
NUMBER : ([0-9]+(','[0-9]+)*)('.'[0-9]*)? ;
ID : [a-zA-Z0-9_]+ ;
NL : '\n' | '\r\n' ;
WS : [\t ]+ -> channel(HIDDEN) ; // keep the spaces (witout spaces ==> paydeltaco98)
文件ExpenseMyListener.java
:
public class ExpenseMyListener extends ExpenseBaseListener {
ExpenseParser parser;
public ExpenseMyListener(ExpenseParser parser) { this.parser = parser; }
public void exitPayments(ExpenseParser.PaymentsContext ctx) {
System.out.println(">>> in ExpenseMyListener for paymentsss");
System.out.println(">>> there are " + ctx.payment().size() + " elements in the list of payments");
for (int i = 0; i < ctx.payment().size(); i++) {
System.out.println(ctx.payment(i).getText());
}
}
public void exitPayment(ExpenseParser.PaymentContext ctx) {
System.out.println(">>> in ExpenseMyListener for payment");
System.out.println(parser.getTokenStream().getText(ctx));
}
}
文件test_expense.java
:
import org.antlr.v4.runtime.ANTLRFileStream;
import org.antlr.v4.runtime.ANTLRInputStream;
import org.antlr.v4.runtime.CommonTokenStream;
import org.antlr.v4.runtime.ParserRuleContext;
import org.antlr.v4.runtime.tree.*;
import java.io.FileInputStream;
import java.io.InputStream;
import java.io.IOException;
public class test_expense {
public static void main(String[] args) throws IOException {
ANTLRInputStream input = new ANTLRFileStream(args[0]);
ExpenseLexer lexer = new ExpenseLexer(input);
CommonTokenStream tokens = new CommonTokenStream(lexer);
ExpenseParser parser = new ExpenseParser(tokens);
ParseTree tree = parser.payments();
System.out.println("---parsing ended");
ParseTreeWalker walker = new ParseTreeWalker();
ExpenseMyListener my_listener = new ExpenseMyListener(parser);
System.out.println(">>>> about to walk");
walker.walk(my_listener, tree);
}
}
输入文件top.text
:
pay Acme Corp 123,456
pay Banana Inc 456789.00
pay charlie pte 123,456.89
pay delta co 98
执行:
$ export CLASSPATH=".:/usr/local/lib/antlr-4.6-complete.jar"
$ alias
alias a4='java -jar /usr/local/lib/antlr-4.6-complete.jar'
alias grun='java org.antlr.v4.gui.TestRig'
$ a4 Expense.g4
$ javac Ex*.java
$ javac test_expense.java
$ grun Expense payments -tokens -diagnostics top.text
[@0,0:2='pay',<'pay'>,1:0]
[@1,3:3=' ',<WS>,channel=1,1:3]
[@2,4:7='Acme',<ID>,1:4]
[@3,8:8=' ',<WS>,channel=1,1:8]
[@4,9:12='Corp',<ID>,1:9]
...
[@32,90:89='<EOF>',<EOF>,5:0]
Expense last update 1853
Payement found 123,456 to Acme Corp
Payement found 456789.00 to Banana Inc
Payement found 123,456.89 to charlie pte
Payement found 98 to delta co
$ java test_expense top.text
Expense last update 1853
Payement found 123,456 to Acme Corp
Payement found 456789.00 to Banana Inc
Payement found 123,456.89 to charlie pte
Payement found 98 to delta co
---parsing ended
>>>> about to walk
>>> in ExpenseMyListener for payment
pay Acme Corp 123,456
>>> in ExpenseMyListener for payment
pay Banana Inc 456789.00
>>> in ExpenseMyListener for payment
pay charlie pte 123,456.89
>>> in ExpenseMyListener for payment
pay delta co 98
>>> in ExpenseMyListener for paymentsss
>>> there are 4 elements in the list of payments
payAcmeCorp123,456
payBananaInc456789.00
paycharliepte123,456.89
paydeltaco98
我正在关注 this tutorial 并成功复制了它的行为,除了我使用的是 Antlr 4.7 而不是教程使用的 4.5。
我正在尝试为费用跟踪器构建 DSL。
想知道是否每个元素都可以有属性?
例如这是现在的样子
这是在 https://github.com/simkimsia/learn-antlr-web-js/blob/master/todo.g4
中看到的 todo.g4 的代码grammar todo;
elements
: (element|emptyLine)* EOF
;
element
: '*' ( ' ' | '\t' )* CONTENT NL+
;
emptyLine
: NL
;
NL
: '\r' | '\n'
;
CONTENT
: [a-zA-Z0-9_][a-zA-Z0-9_ \t]*
;
意思是说该元素还将具有 amount 和 payee 等 2 个属性。为了简单起见,我将使用相同的句子结构,以便更容易地进行解析。
格式将为 pay [payee] [amount]
例子是pay Acme Corp 123,789.45
因此收款人是 Acme Corp,金额为 12378945,以整数表示以美分计的金额
另一个例子是pay Banana Inc 700
因此收款人是 Banana Inc,金额为 70000,以整数表示以美分计的金额
我猜我需要更改 todo.g4 然后重新生成解析器。
元素可以有其他属性吗? 如果是这样,我该如何开始?
更新
这是我最近的尝试,最新更新排在最前面:
我刚刚弄清楚如何使用 grun 和 testRig。感谢@Raven 的提示。
最新尝试:我最新的 expense.g4(与之前尝试的唯一区别是付款的正则表达式)
grammar expense;
payments: (payment NL)* ;
payment: PAY receiver amount=NUMBER ;
receiver: surname=ID (lastname=ID)? ;
PAY: 'pay' ;
NUMBER: ([0-9]+(','[0-9]+)*)('.'[0-9]*)?;
ID: [a-zA-Z0-9_]+ ;
NL: '\n' | '\r\n' ;
WS: [\t ]+ -> skip ;
较早的尝试:这是我的费用。g4
grammar expense;
payments: (payment NL)* ;
payment: PAY receiver amount=NUMBER ;
receiver: surname=ID (lastname=ID)? ;
PAY: 'pay' ;
NUMBER: [0-9]+ (',' [0-9]+)+ ('.' [0-9]+)? ;
ID: [a-zA-Z0-9_]+ ;
NL: '\n' | '\r\n' ;
WS: [\t ]+ -> skip ;
我不完全确定你到底想要什么,但对于提供的示例,这个语法应该可以完成工作:
payments: (payment NL)* ;
payment: PAY receiver amount=NUMBER ;
receiver: surname=ID (lastname=ID)? ;
PAY: 'pay' ;
NUMBER: [0-9]+ (',' [0-9]+)+ ('.' [0-9]+)? ;
ID: [a-zA-Z0-9_]+ ;
NL: '\n' | '\r\n' ;
WS: [\t ]+ -> skip ;
如果这是您的要求,我会根据需要添加更多解释...
I am guessing I need to change the todo.g4 and then re generate the parser.
当然每次更改后都会重新生成。对我来说是:
$ a4 Question.g4
$ javac Q*.java
$ grun Question elements -tokens -diagnostics t.text
哪里
$ alias
alias a4='java -jar /usr/local/lib/antlr-4.6-complete.jar'
alias grun='java org.antlr.v4.gui.TestRig'
具体内容描述的越多,越容易遇到歧义问题。例如,您有两个规则:
payment : 'pay' [payee] [amount]
free_text : ... any character ...
考虑以下内容:
* pay Federico Tomassetti 10 € for the tutorial
* pay Federico Tomassetti 10
是有歧义的,可以被两条规则匹配,但最终会被解析为自由文本,因为€ for the tutorial
不满足payment
。
如果您稍后更改 payment
规则以在金额后接受更多信息:
payment : 'pay' [payee] [amount] payment_info
以上内容将被payment
匹配(如果有歧义ANTLR选择第一个规则)。好消息是 ANTLR 4 消除歧义的能力很强,必要时它会读取整个文件。
模棱两可的token和precedence rules,看了这三周的帖子,说了很多
将 Raven 的语法与您的语法混合,这是一种可能的解决方案:
文件Question.g4
grammar Question;
elements
@init {System.out.println("Question last update 1432");}
: ( element | emptyLine )* EOF
;
element
: '*' content NL
;
content
: payment //{System.out.println("Payement found " + $payment.text);}
| free_text {System.out.println("Free text found " + $free_text.text);}
;
payment
: PAY receiver amount=NUMBER
{System.out.println("Payement found " + $amount.text + " to " + $receiver.text);}
;
receiver
: surname=WORD ( lastname=WORD )?
;
free_text
: ( WORD | PAY | NUMBER )+
;
emptyLine
: NL
;
PAY : 'pay' ;
WORD : LETTER ( LETTER | DIGIT | '_' )* ;
NUMBER : DIGIT+ ( ',' DIGIT+ )? ( '.' DIGIT+ )? ;
NL : [\r\n]
| '\r\n'
;
//WS : [ \t]+ -> skip ; // $payment.text => payAcmeCorp123,789.45
WS : [ \t]+ -> channel(HIDDEN) ; // spaces are needed to nicely display $payment.text
fragment DIGIT : [0-9] ;
fragment LETTER : [a-zA-Z] ;
文件t.text
* play with ANTLR 4
* write a tutorial
* pay Acme Corp 123,789.45
* pay Banana Inc 700
* pay Federico Tomassetti 10 € for the tutorial
执行:
$ grun Question elements -tokens -diagnostics t.text
line 5:29 token recognition error at: '€'
[@0,0:0='*',<'*'>,1:0]
[@1,1:1=' ',<WS>,channel=1,1:1]
[@2,2:5='play',<WORD>,1:2]
[@3,6:6=' ',<WS>,channel=1,1:6]
[@4,7:10='with',<WORD>,1:7]
[@5,11:11=' ',<WS>,channel=1,1:11]
[@6,12:16='ANTLR',<WORD>,1:12]
[@7,17:17=' ',<WS>,channel=1,1:17]
[@8,18:18='4',<NUMBER>,1:18]
[@9,19:19='\n',<NL>,1:19]
[@10,20:20='*',<'*'>,2:0]
[@11,21:21=' ',<WS>,channel=1,2:1]
[@12,22:26='write',<WORD>,2:2]
[@13,27:27=' ',<WS>,channel=1,2:7]
[@14,28:28='a',<WORD>,2:8]
[@15,29:29=' ',<WS>,channel=1,2:9]
[@16,30:37='tutorial',<WORD>,2:10]
[@17,38:38='\n',<NL>,2:18]
...
[@56,136:135='<EOF>',<EOF>,7:0]
Question last update 1432
Free text found play with ANTLR 4
Free text found write a tutorial
line 3:26 reportAttemptingFullContext d=2 (content), input='pay Acme Corp 123,789.45
'
...
Payement found 700 to Banana Inc
Free text found pay Federico Tomassetti 10 for the tutorial
如您所见,无法识别 € 符号。您可能需要一个类似于 FIELDTEXT
CONTENT
规则,然后您就会遇到麻烦 ...
费德里科 Mega tutorial is a good start. For nitty-gritty details, see The Definitive ANTLR 4 Reference or the online doc from www.antlr.org.
2017 年 10 月 24 日 19:00 UTC+1 的情况。
你的语法很完美。我在 Java.
中进行了全面测试文件Expense.g4
:
grammar Expense;
payments
@init {System.out.println("Expense last update 1853");}
: (payment NL)*
;
payment
: PAY receiver amount=NUMBER
{System.out.println("Payement found " + $amount.text + " to " + $receiver.text);}
;
receiver
: surname=ID (lastname=ID)?
;
PAY : 'pay' ;
NUMBER : ([0-9]+(','[0-9]+)*)('.'[0-9]*)? ;
ID : [a-zA-Z0-9_]+ ;
NL : '\n' | '\r\n' ;
WS : [\t ]+ -> channel(HIDDEN) ; // keep the spaces (witout spaces ==> paydeltaco98)
文件ExpenseMyListener.java
:
public class ExpenseMyListener extends ExpenseBaseListener {
ExpenseParser parser;
public ExpenseMyListener(ExpenseParser parser) { this.parser = parser; }
public void exitPayments(ExpenseParser.PaymentsContext ctx) {
System.out.println(">>> in ExpenseMyListener for paymentsss");
System.out.println(">>> there are " + ctx.payment().size() + " elements in the list of payments");
for (int i = 0; i < ctx.payment().size(); i++) {
System.out.println(ctx.payment(i).getText());
}
}
public void exitPayment(ExpenseParser.PaymentContext ctx) {
System.out.println(">>> in ExpenseMyListener for payment");
System.out.println(parser.getTokenStream().getText(ctx));
}
}
文件test_expense.java
:
import org.antlr.v4.runtime.ANTLRFileStream;
import org.antlr.v4.runtime.ANTLRInputStream;
import org.antlr.v4.runtime.CommonTokenStream;
import org.antlr.v4.runtime.ParserRuleContext;
import org.antlr.v4.runtime.tree.*;
import java.io.FileInputStream;
import java.io.InputStream;
import java.io.IOException;
public class test_expense {
public static void main(String[] args) throws IOException {
ANTLRInputStream input = new ANTLRFileStream(args[0]);
ExpenseLexer lexer = new ExpenseLexer(input);
CommonTokenStream tokens = new CommonTokenStream(lexer);
ExpenseParser parser = new ExpenseParser(tokens);
ParseTree tree = parser.payments();
System.out.println("---parsing ended");
ParseTreeWalker walker = new ParseTreeWalker();
ExpenseMyListener my_listener = new ExpenseMyListener(parser);
System.out.println(">>>> about to walk");
walker.walk(my_listener, tree);
}
}
输入文件top.text
:
pay Acme Corp 123,456
pay Banana Inc 456789.00
pay charlie pte 123,456.89
pay delta co 98
执行:
$ export CLASSPATH=".:/usr/local/lib/antlr-4.6-complete.jar"
$ alias
alias a4='java -jar /usr/local/lib/antlr-4.6-complete.jar'
alias grun='java org.antlr.v4.gui.TestRig'
$ a4 Expense.g4
$ javac Ex*.java
$ javac test_expense.java
$ grun Expense payments -tokens -diagnostics top.text
[@0,0:2='pay',<'pay'>,1:0]
[@1,3:3=' ',<WS>,channel=1,1:3]
[@2,4:7='Acme',<ID>,1:4]
[@3,8:8=' ',<WS>,channel=1,1:8]
[@4,9:12='Corp',<ID>,1:9]
...
[@32,90:89='<EOF>',<EOF>,5:0]
Expense last update 1853
Payement found 123,456 to Acme Corp
Payement found 456789.00 to Banana Inc
Payement found 123,456.89 to charlie pte
Payement found 98 to delta co
$ java test_expense top.text
Expense last update 1853
Payement found 123,456 to Acme Corp
Payement found 456789.00 to Banana Inc
Payement found 123,456.89 to charlie pte
Payement found 98 to delta co
---parsing ended
>>>> about to walk
>>> in ExpenseMyListener for payment
pay Acme Corp 123,456
>>> in ExpenseMyListener for payment
pay Banana Inc 456789.00
>>> in ExpenseMyListener for payment
pay charlie pte 123,456.89
>>> in ExpenseMyListener for payment
pay delta co 98
>>> in ExpenseMyListener for paymentsss
>>> there are 4 elements in the list of payments
payAcmeCorp123,456
payBananaInc456789.00
paycharliepte123,456.89
paydeltaco98