ANTLR Grammar 区分单词、字母数字和数字
ANTLR Grammar to distinguish words, alphanumeric and numbers
我仍在努力学习 ANTLR,如果能支持此语法的增强版本,我将不胜感激。
这是一个输入字符串:
SYS [ErrorCode is not Available] : Transaction ID:
d9d1211e-d273-40e1-bdd0-e4c9a8036ef3 . This can be ignored safely to:
map To Not availble : works in progress
预期的解析器输出:
- 字数 -> SYS
- 特价 -> [
- 字数 -> 错误代码
- 字数 -> 是
- .....
- 字母数字 -> d9d1211e-d273-40e1-bdd0-e4c9a8036ef3
- ...
到目前为止我想到的ANTLR语法:
grammar Expressions;
expression
:
| numbers? specials? words (numbers? specials? words)*
| numbers words specials
| specials words numbers
| specials numbers words
| words specials numbers
| words numbers specials
| specials specials? (specials specials? )*
| words words? (words words?)*
| numbers numbers? (numbers numbers?)*
;
words
: CHARACTERS
;
numbers
: NUMBERS
;
specials
: AND
| OR
| EQUALS
| ASSIGN
| GT
| LT
| GTE
| LTE
| NOTEQUALS
| NOT
| PLUS
| MINUS
| IF
| COLON
| TLB
| TRB
| FLB
| FRB
| DOT
;
AND : '&&' ;
OR : '||' ;
EQUALS : '==' ;
ASSIGN : '=' ;
GT : '>' ;
LT : '<' ;
GTE : '>=' ;
LTE : '<=' ;
NOTEQUALS : '!=' ;
NOT : '!' ;
PLUS : '+' ;
MINUS : '-' ;
IF : 'if' ;
COLON : ':' ;
TLB : '[' ;
TRB : ']' ;
FLB : ')' ;
FRB : '(' ;
DOT : '.' ;
CHARACTERS
: [a-zA-Z] [a-zA-Z]*
;
NUMBERS
: [0-9]+
| ([0-9]+)? '.' ([0-9])+
;
WS : [ \t\r\n]+ -> skip
;
编写了这个简单的 golang 程序来查找字符串中是否包含任何数字。
package main
import (
"fmt"
"strconv"
"strings"
)
func main() {
someString := "ID:8e038845-bd81-4218-9769-8406241fbb34 Operation is failed java.core.CoreRuntimeException: java.core.CoreRuntimeException: The JDBC connection information provided is incomplete"
words := strings.Fields(someString)
var tokens []string
var x int
for _, j := range words {
if HasDigit(j) {
dynamic := "$" + strconv.Itoa(x)
tokens = append(tokens, dynamic)
x++
} else {
tokens = append(tokens, j)
}
}
var tokenized string
tokenized = strings.Join(tokens, " ")
fmt.Println(tokenized)
}
func HasDigit(s string) bool {
for _, r := range s {
if '0' <= r && r <= '9' {
return true
}
}
return false
}
我仍在努力学习 ANTLR,如果能支持此语法的增强版本,我将不胜感激。
这是一个输入字符串:
SYS [ErrorCode is not Available] : Transaction ID: d9d1211e-d273-40e1-bdd0-e4c9a8036ef3 . This can be ignored safely to: map To Not availble : works in progress
预期的解析器输出:
- 字数 -> SYS
- 特价 -> [
- 字数 -> 错误代码
- 字数 -> 是
- .....
- 字母数字 -> d9d1211e-d273-40e1-bdd0-e4c9a8036ef3
- ...
到目前为止我想到的ANTLR语法:
grammar Expressions;
expression
:
| numbers? specials? words (numbers? specials? words)*
| numbers words specials
| specials words numbers
| specials numbers words
| words specials numbers
| words numbers specials
| specials specials? (specials specials? )*
| words words? (words words?)*
| numbers numbers? (numbers numbers?)*
;
words
: CHARACTERS
;
numbers
: NUMBERS
;
specials
: AND
| OR
| EQUALS
| ASSIGN
| GT
| LT
| GTE
| LTE
| NOTEQUALS
| NOT
| PLUS
| MINUS
| IF
| COLON
| TLB
| TRB
| FLB
| FRB
| DOT
;
AND : '&&' ;
OR : '||' ;
EQUALS : '==' ;
ASSIGN : '=' ;
GT : '>' ;
LT : '<' ;
GTE : '>=' ;
LTE : '<=' ;
NOTEQUALS : '!=' ;
NOT : '!' ;
PLUS : '+' ;
MINUS : '-' ;
IF : 'if' ;
COLON : ':' ;
TLB : '[' ;
TRB : ']' ;
FLB : ')' ;
FRB : '(' ;
DOT : '.' ;
CHARACTERS
: [a-zA-Z] [a-zA-Z]*
;
NUMBERS
: [0-9]+
| ([0-9]+)? '.' ([0-9])+
;
WS : [ \t\r\n]+ -> skip
;
编写了这个简单的 golang 程序来查找字符串中是否包含任何数字。
package main
import (
"fmt"
"strconv"
"strings"
)
func main() {
someString := "ID:8e038845-bd81-4218-9769-8406241fbb34 Operation is failed java.core.CoreRuntimeException: java.core.CoreRuntimeException: The JDBC connection information provided is incomplete"
words := strings.Fields(someString)
var tokens []string
var x int
for _, j := range words {
if HasDigit(j) {
dynamic := "$" + strconv.Itoa(x)
tokens = append(tokens, dynamic)
x++
} else {
tokens = append(tokens, j)
}
}
var tokenized string
tokenized = strings.Join(tokens, " ")
fmt.Println(tokenized)
}
func HasDigit(s string) bool {
for _, r := range s {
if '0' <= r && r <= '9' {
return true
}
}
return false
}