Flex/Bison - 我的正则表达式不匹配出现的两个或多个 X，例如 XXY-1 或 XXY-1

Question

我正在使用 flex 和 bison 为虚构的编程语言创建解析器。将有有效和无效的变量名。

XXXX XY-1 // valid
XXXXX Z // valid
XXX Y // valid
XXX 5Aet // invalid
XXXX XXAB-Y // invalid

开头的 x 只是指定变量的大小。变量 5Aet 是 无效的 因为它以数字开头。我已经成功匹配了这个

的正则表达式

[\_\-0-9][a-zA-Z][a-zA-Z0-9\-\_]* yylval.string = strdup(yytext);return TERM_INVALID_VARIABLE_NAME;

变量 XXAB-Y 无效因为变量名称不能以两个或更多 x 个字符开头。

我尝试为此匹配一个正则表达式，但我没有成功。我尝试了以下表达式的各种组合，但 none 有效。变量不断匹配为有效。

[X]{2,}[A-Z0-9\-]* yylval.string = strdup(yytext);return TERM_INVALID_VARIABLE_NAME;

[X]{2,0}[\_\-0-9][a-zA-Z][a-zA-Z0-9\-\_]* yylval.string = strdup(yytext);return TERM_INVALID_VARIABLE_NAME;

lexer.l 片段

[\t ]+ // ignore whitespaces

\n // Ignore new line

[\"][^"]*[\"] yylval.string = strdup(yytext); return TERM_STR;

";" return TERM_SEPARATOR;

"." return TERM_FULLSTOP;

[0-9]+ yylval.integer = atoi(yytext); return TERM_INT;

XX[A-Z0-9-]* yylval.string = strdup(yytext);return TERM_INVALID_VARIABLE_NAME;

[\_\-0-9]+[a-zA-Z][a-zA-Z0-9\-\_]* yylval.string = strdup(yytext);return TERM_INVALID_VARIABLE_NAME;

[A-Z][A-Z0-9\-]* yylval.string = strdup(yytext); return TERM_VARIABLE_NAME;

[X]+ yylval.integer = yyleng; return TERM_SIZE;

. return TERM_INVALID_TOKEN;

parser.y 片段

program:
    /* empty */ | 
    begin middle_declarations body grammar_s end {
        printf("\nParsing complete\n");
        exit(0);
    };

begin:
    TERM_BEGINING TERM_FULLSTOP;

body:
    TERM_BODY TERM_FULLSTOP;

end:
    TERM_END TERM_FULLSTOP;

middle_declarations:
    /* empty */ |
    //Left recursive to allow for many declearations
    middle_declarations declaration TERM_FULLSTOP;

declaration:
    TERM_SIZE TERM_VARIABLE_NAME {
        createVar(, );
    }
    |
    TERM_SIZE TERM_INVALID_VARIABLE_NAME {
        printInvalidVarName();
    };

grammar_s:
    /* empty */ |
    grammar_s grammar TERM_FULLSTOP;

grammar:
    add | move | print | input;

add:
    TERM_ADD TERM_INT TERM_TO TERM_VARIABLE_NAME {
        addIntToVar(, );
    }
    |
    TERM_ADD TERM_VARIABLE_NAME TERM_TO TERM_VARIABLE_NAME {
        addVarToVar(, );
    }

    ;

move:
    TERM_MOVE TERM_VARIABLE_NAME TERM_TO TERM_VARIABLE_NAME {
        moveVarToVar(, );
    }
    |
    TERM_MOVE TERM_INT TERM_TO TERM_VARIABLE_NAME {
        moveIntToVar(, );
    }

    ;

print:
    /* empty */ |
    TERM_PRINT rest_of_print {
        printf("\n");
    };

rest_of_print:
    /* empty */ |
    rest_of_print other_print;

other_print:

    TERM_VARIABLE_NAME {
        printVarValue();
    }
    |
    TERM_SEPARATOR {
        // do nothing
    }
    |
    TERM_STR {
        printf("%s", );
    }

    ;

input:
    // Fullstop declares grammar
    TERM_INPUT other_input;

other_input:

    /* empty */ |
    // Input var1
    TERM_VARIABLE_NAME {
        inputValues();
    }
    |
    // Can be input var1; var2;...varN
    other_input TERM_SEPARATOR TERM_VARIABLE_NAME {
        inputValues();
    }
    ;

调试输出：

Starting parse
Entering state 0
Reading a token: Next token is token TERM_BEGINING (1.1: )
Shifting token TERM_BEGINING (1.1: )
Entering state 1
Reading a token: Next token is token TERM_FULLSTOP (1.1: )
Shifting token TERM_FULLSTOP (1.1: )
Entering state 4
Reducing stack by rule 3 (line 123):
    = token TERM_BEGINING (1.1: )
    = token TERM_FULLSTOP (1.1: )
-> $$ = nterm begin (1.1: )
Stack now 0
Entering state 3
Reducing stack by rule 6 (line 131):
-> $$ = nterm middle_declarations (1.1: )
Stack now 0 3
Entering state 6
Reading a token: Next token is token TERM_SIZE (1.1: )
Shifting token TERM_SIZE (1.1: )
Entering state 8
Reading a token: Next token is token TERM_VARIABLE_NAME (1.1: )
Shifting token TERM_VARIABLE_NAME (1.1: )
Entering state 13
Reducing stack by rule 8 (line 137):
    = token TERM_SIZE (1.1: )
    = token TERM_VARIABLE_NAME (1.1: )
-> $$ = nterm declaration (1.1: )
Stack now 0 3 6
Entering state 10
Reading a token: Next token is token TERM_FULLSTOP (1.1: )
Shifting token TERM_FULLSTOP (1.1: )
Entering state 15
Reducing stack by rule 7 (line 134):
    = nterm middle_declarations (1.1: )
    = nterm declaration (1.1: )
    = token TERM_FULLSTOP (1.1: )
-> $$ = nterm middle_declarations (1.1: )
Stack now 0 3
Entering state 6
Reading a token: Next token is token TERM_SIZE (1.1: )
Shifting token TERM_SIZE (1.1: )
Entering state 8
Reading a token: Next token is token TERM_VARIABLE_NAME (1.1: )
Shifting token TERM_VARIABLE_NAME (1.1: )
Entering state 13
Reducing stack by rule 8 (line 137):
    = token TERM_SIZE (1.1: )
    = token TERM_VARIABLE_NAME (1.1: )
-> $$ = nterm declaration (1.1: )
Stack now 0 3 6
Entering state 10
Reading a token: Next token is token TERM_FULLSTOP (1.1: )
Shifting token TERM_FULLSTOP (1.1: )
Entering state 15
Reducing stack by rule 7 (line 134):
    = nterm middle_declarations (1.1: )
    = nterm declaration (1.1: )
    = token TERM_FULLSTOP (1.1: )
-> $$ = nterm middle_declarations (1.1: )
Stack now 0 3
Entering state 6
Reading a token: Next token is token TERM_SIZE (1.1: )
Shifting token TERM_SIZE (1.1: )
Entering state 8
Reading a token: Next token is token TERM_VARIABLE_NAME (1.1: )
Shifting token TERM_VARIABLE_NAME (1.1: )
Entering state 13
Reducing stack by rule 8 (line 137):
    = token TERM_SIZE (1.1: )
    = token TERM_VARIABLE_NAME (1.1: )
-> $$ = nterm declaration (1.1: )
Stack now 0 3 6
Entering state 10
Reading a token: Next token is token TERM_FULLSTOP (1.1: )
Shifting token TERM_FULLSTOP (1.1: )
Entering state 15
Reducing stack by rule 7 (line 134):
    = nterm middle_declarations (1.1: )
    = nterm declaration (1.1: )
    = token TERM_FULLSTOP (1.1: )
-> $$ = nterm middle_declarations (1.1: )
Stack now 0 3
Entering state 6
Reading a token: Next token is token TERM_BODY (1.1: )
Shifting token TERM_BODY (1.1: )
Entering state 7
Reading a token: Next token is token TERM_FULLSTOP (1.1: )
Shifting token TERM_FULLSTOP (1.1: )
Entering state 11
Reducing stack by rule 4 (line 126):
    = token TERM_BODY (1.1: )
    = token TERM_FULLSTOP (1.1: )
-> $$ = nterm body (1.1: )
Stack now 0 3 6
Entering state 9
Reducing stack by rule 10 (line 145):
-> $$ = nterm grammar_s (1.1: )
Stack now 0 3 6 9
Entering state 14
Reading a token: Next token is token TERM_PRINT (1.1: )
Shifting token TERM_PRINT (1.1: )
Entering state 20
Reducing stack by rule 22 (line 180):
-> $$ = nterm rest_of_print (1.1: )
Stack now 0 3 6 9 14 20
Entering state 34
Reading a token: Next token is token TERM_STR (1.1: )
Shifting token TERM_STR (1.1: )
Entering state 41
Reducing stack by rule 26 (line 194):
    = token TERM_STR (1.1: )
-> $$ = nterm other_print (1.1: )
Stack now 0 3 6 9 14 20 34
Entering state 44
Reducing stack by rule 23 (line 182):
    = nterm rest_of_print (1.1: )
    = nterm other_print (1.1: )
-> $$ = nterm rest_of_print (1.1: )
Stack now 0 3 6 9 14 20
Entering state 34
Reading a token: Next token is token TERM_FULLSTOP (1.1: )
Reducing stack by rule 21 (line 176):
    = token TERM_PRINT (1.1: )
    = nterm rest_of_print (1.1: )
"hEllo"
-> $$ = nterm print (1.1: )
Stack now 0 3 6 9 14
Entering state 25
Reducing stack by rule 14 (line 150):
    = nterm print (1.1: )
-> $$ = nterm grammar (1.1: )
Stack now 0 3 6 9 14
Entering state 22
Next token is token TERM_FULLSTOP (1.1: )
Shifting token TERM_FULLSTOP (1.1: )
Entering state 35
Reducing stack by rule 11 (line 147):
    = nterm grammar_s (1.1: )
    = nterm grammar (1.1: )
    = token TERM_FULLSTOP (1.1: )
-> $$ = nterm grammar_s (1.1: )
Stack now 0 3 6 9
Entering state 14
Reading a token: Next token is token TERM_END (1.1: )
Shifting token TERM_END (1.1: )
Entering state 16
Reading a token: Next token is token TERM_FULLSTOP (1.1: )
Shifting token TERM_FULLSTOP (1.1: )
Entering state 27
Reducing stack by rule 5 (line 129):
    = token TERM_END (1.1: )
    = token TERM_FULLSTOP (1.1: )
-> $$ = nterm end (1.1: )
Stack now 0 3 6 9 14
Entering state 21
Reducing stack by rule 2 (line 113):
    = nterm begin (1.1: )
    = nterm middle_declarations (1.1: )
    = nterm body (1.1: )
    = nterm grammar_s (1.1: )
    = nterm end (1.1: )

示例输入：

BeGiNInG.

X XXAB-.
XX XXX7.
XX XXXY.

BoDY.

print "hEllo".

EnD.

Answer 1

[X]{2,}[A-Z0-9\-]* yylval.string = strdup(yytext);return TERM_INVALID_VARIABLE_NAME;

应该工作得很好，而且对我来说确实工作得很好。不过可以简化为：

XX[A-Z0-9-]* yylval.string = strdup(yytext);return TERM_INVALID_VARIABLE_NAME;

因为任何其他 X 个字符将匹配 [A-Z0-9-] 个字符 class。（请注意，不必在字符 class 中写入 \-；只要它是字符 class 中的第一个或最后一个，- 就可以了。 )

该模式（如您的模式）也仅匹配 XX，但 [X]+ 模式将获胜，因为它在 flex 输入文件中较早出现。

{2,0} 不是有效的区间表达式，因为 0 小于 2。要指定“2 个或更多 X”，请写 X{2,}（或 [X]{2,}，如果您愿意. "X"{2,} 也有效。）这应该会从 flex 产生一条错误消息，结果是不会产生词法扫描器。（但是，旧的可能仍然存在，这可能会造成混乱。）

Flex/Bison - 我的正则表达式不匹配出现的两个或多个 X，例如 XXY-1 或 XXY-1

Flex/Bison - My regular expression is not matching occurrences of two or more X's, example XXY-1 or XXY-1

regex

yacc

lex

bison

flex-lexer