Antlr4 for Python:将数据解析成多个部分
Antlr4 for Python: Parse data into parts
我有一个非常简单的 Antlr4 语法:
grammar settings;
query
: COLUMN OPERATOR (SETTING|SCALAR)
;
COLUMN
: [a-z_]+
;
OPERATOR
: ('='|'>'|'<')
;
SETTING
: 'setting(' [a-z_]+ ')'
;
SCALAR
: [a-z_]+
;
我想像 total_sales>setting(min_total_sales)
这样的输入字符串(它们代表数据库列名、运算符和值)定义什么是列名、运算符和值。为此开发了一些 python 代码:
import re
from antlr4 import InputStream, CommonTokenStream
from settingsLexer import settingsLexer
from settingsParser import settingsParser
settings = {
'min_total_sales': 1000
}
conditions = 'total_sales>setting(min_total_sales)'
lexer = settingsLexer(InputStream(conditions))
stream = CommonTokenStream(lexer)
parser = settingsParser(stream)
tree = parser.query()
regex = re.compile('^setting\((?P<setting_name>[a-z_]+)\)$')
column = None
operator = None
value = None
for child in tree.getChildren():
text = child.getText()
# how to match what is child: column or operator or value???
# this for value defining
if match := regex.match(text):
setting_name = match.group('setting_name')
print(f'We should get value from setting named `{setting_name}`')
min_total_sales = settings['min_total_sales']
else:
print(f'We got a simple scalar value: {text}')
min_total_sales = int(text)
如何匹配子项:列名或运算符或值?
为什么要涉及正则表达式?解析输入后,tree
结构将包含与其匹配的规则相对应的方法。所以,parser.query()
返回的对象,就是解析器规则:
query
: COLUMN OPERATOR (SETTING|SCALAR)
;
将有 4 种方法:COLUMN()
、OPERATOR()
、SETTING()
和 SCALAR()
使用它们提取您想要的数据:
tree = parser.query()
column = tree.COLUMN()
operator = tree.OPERATOR()
setting = tree.SETTING()
print(f"column={column}, operator={operator}, setting={setting}")
而且我不会将 setting
和 min_total_sales
粘合到 1 个大标记中,而是由解析器来完成。否则像 total_sales>setting ( min_total_sales )
这样的输入将因为空格而无法匹配。
grammar settings;
query
: COLUMN OPERATOR value EOF
;
value
: setting
| SCALAR
;
setting
: SETTING '(' SCALAR ')'
;
COLUMN
: [a-z_]+
;
OPERATOR
: ('='|'>'|'<')
;
SETTING
: 'setting'
;
SCALAR
: [a-z_]+
;
SPACES
: [ \t\r\n] -> skip
;
我有一个非常简单的 Antlr4 语法:
grammar settings;
query
: COLUMN OPERATOR (SETTING|SCALAR)
;
COLUMN
: [a-z_]+
;
OPERATOR
: ('='|'>'|'<')
;
SETTING
: 'setting(' [a-z_]+ ')'
;
SCALAR
: [a-z_]+
;
我想像 total_sales>setting(min_total_sales)
这样的输入字符串(它们代表数据库列名、运算符和值)定义什么是列名、运算符和值。为此开发了一些 python 代码:
import re
from antlr4 import InputStream, CommonTokenStream
from settingsLexer import settingsLexer
from settingsParser import settingsParser
settings = {
'min_total_sales': 1000
}
conditions = 'total_sales>setting(min_total_sales)'
lexer = settingsLexer(InputStream(conditions))
stream = CommonTokenStream(lexer)
parser = settingsParser(stream)
tree = parser.query()
regex = re.compile('^setting\((?P<setting_name>[a-z_]+)\)$')
column = None
operator = None
value = None
for child in tree.getChildren():
text = child.getText()
# how to match what is child: column or operator or value???
# this for value defining
if match := regex.match(text):
setting_name = match.group('setting_name')
print(f'We should get value from setting named `{setting_name}`')
min_total_sales = settings['min_total_sales']
else:
print(f'We got a simple scalar value: {text}')
min_total_sales = int(text)
如何匹配子项:列名或运算符或值?
为什么要涉及正则表达式?解析输入后,tree
结构将包含与其匹配的规则相对应的方法。所以,parser.query()
返回的对象,就是解析器规则:
query
: COLUMN OPERATOR (SETTING|SCALAR)
;
将有 4 种方法:COLUMN()
、OPERATOR()
、SETTING()
和 SCALAR()
使用它们提取您想要的数据:
tree = parser.query()
column = tree.COLUMN()
operator = tree.OPERATOR()
setting = tree.SETTING()
print(f"column={column}, operator={operator}, setting={setting}")
而且我不会将 setting
和 min_total_sales
粘合到 1 个大标记中,而是由解析器来完成。否则像 total_sales>setting ( min_total_sales )
这样的输入将因为空格而无法匹配。
grammar settings;
query
: COLUMN OPERATOR value EOF
;
value
: setting
| SCALAR
;
setting
: SETTING '(' SCALAR ')'
;
COLUMN
: [a-z_]+
;
OPERATOR
: ('='|'>'|'<')
;
SETTING
: 'setting'
;
SCALAR
: [a-z_]+
;
SPACES
: [ \t\r\n] -> skip
;