ANTLR:对 bash 文件进行词法分析,尤其是 heredoc
ANTLR: lexing bash files, especially heredoc
我正在使用 ANTLR 来 lex bash 文件(用于语法着色)。是否可以使用 heredoc:
等动态结尾的 lex 规则
cat <<ENDTEXT
hello world,
this text may contain
any letters, even ' and "
ENDTEXT
或
cat <<FOO
here a different end-word
is used
FOO
这只有 predicate 才有可能。
这是一个简单的例子:
lexer grammar BashLexer;
@members {
private boolean heredocEndAhead(String partialHeredoc) {
if (this.getCharPositionInLine() != 0) {
// If the lexer is not at the start of a line, no end-delimiter can be possible
return false;
}
// Get the delimiter
String firstLine = partialHeredoc.split("\r?\n|\r")[0];
String delimiter = firstLine.replaceAll("^<<-?\s*", "");
for (int n = 1; n < delimiter.length(); n++) {
if (this._input.LA(n) != delimiter.charAt(n - 1)) {
return false;
}
}
// If we get to this point, we know there is an end delimiter ahead in the char stream, make
// sure it is followed by a white space (or the EOF). If we don't do this, then "FOOS" would also
// be considered the end for the delimiter "FOO"
int charAfterDelimiter = this._input.LA(delimiter.length() + 1);
return charAfterDelimiter == EOF || Character.isWhitespace(charAfterDelimiter);
}
}
HEREDOC
: '<<' '-'? [ \t]* [a-zA-Z_] [a-zA-Z_0-9]* NL ( {!heredocEndAhead(getText())}? . )* [a-zA-Z_] [a-zA-Z_0-9]*
;
ANY
: .
;
fragment NL
: '\r'? '\n'
| '\r'
;
这将标记输入:
cat <<ENDTEXT
hello world,
ENDTEXTS ENDTEXT
this text may contain
any letters, even ' and "
ENDTEXT
像这样:
ANY `c`
ANY `a`
ANY `t`
ANY ` `
HEREDOC `<<ENDTEXT\nhello world, \nENDTEXTS ENDTEXT\nthis text may contain \nany letters, even ' and "\nENDTEXT`
EOF `<EOF>`
我正在使用 ANTLR 来 lex bash 文件(用于语法着色)。是否可以使用 heredoc:
等动态结尾的 lex 规则cat <<ENDTEXT
hello world,
this text may contain
any letters, even ' and "
ENDTEXT
或
cat <<FOO
here a different end-word
is used
FOO
这只有 predicate 才有可能。
这是一个简单的例子:
lexer grammar BashLexer;
@members {
private boolean heredocEndAhead(String partialHeredoc) {
if (this.getCharPositionInLine() != 0) {
// If the lexer is not at the start of a line, no end-delimiter can be possible
return false;
}
// Get the delimiter
String firstLine = partialHeredoc.split("\r?\n|\r")[0];
String delimiter = firstLine.replaceAll("^<<-?\s*", "");
for (int n = 1; n < delimiter.length(); n++) {
if (this._input.LA(n) != delimiter.charAt(n - 1)) {
return false;
}
}
// If we get to this point, we know there is an end delimiter ahead in the char stream, make
// sure it is followed by a white space (or the EOF). If we don't do this, then "FOOS" would also
// be considered the end for the delimiter "FOO"
int charAfterDelimiter = this._input.LA(delimiter.length() + 1);
return charAfterDelimiter == EOF || Character.isWhitespace(charAfterDelimiter);
}
}
HEREDOC
: '<<' '-'? [ \t]* [a-zA-Z_] [a-zA-Z_0-9]* NL ( {!heredocEndAhead(getText())}? . )* [a-zA-Z_] [a-zA-Z_0-9]*
;
ANY
: .
;
fragment NL
: '\r'? '\n'
| '\r'
;
这将标记输入:
cat <<ENDTEXT
hello world,
ENDTEXTS ENDTEXT
this text may contain
any letters, even ' and "
ENDTEXT
像这样:
ANY `c`
ANY `a`
ANY `t`
ANY ` `
HEREDOC `<<ENDTEXT\nhello world, \nENDTEXTS ENDTEXT\nthis text may contain \nany letters, even ' and "\nENDTEXT`
EOF `<EOF>`