词法分析器 (Flex) 在 space 之后抛出词法错误，后跟标记（以及更多）

Question

因为我运行词法分析器和下面的例子，它似乎无法识别标记之间的空 space，并且用正则表达式生成的标记显示出来。

词法分析器（Something.l）：

%{
#include <stdio.h>
#include <stdlib.h>

int yylex();
//void yyerror(const char *s);
void yyerror (const char * msg)
{
  fprintf(stderr, "C-like : %s\n", msg);
  exit(1);
}
int line_num = 1;

#include "y.tab.h"
#define T_eof   0
%}


%option noyywrap 


letter  [A-Za-z]
digit   [0-9]
id  letter(letter|digit|'_')*
num [1-9]digit*('.'digit*)?
string  '(digit|letter)*'
Empty   [\t\r]|" "
line    [\n]

%%

{line}      { line_num++ ; }
"mainclass" { printf("MAINCLASS ") ; return  (MAINCLASS) ; }
"public"    { printf("PUBLIC ") ; return (PUBLIC); }
"static"    { printf("STATIC ") ; return (STATIC) ; }
"void"      { printf("VOID ") ; return (VOID) ; }
"main"      { printf("MAIN ") ; return (MAIN) ; }
"println"   { printf("PRINTLN ") ; return (PRINTLN) ; }   
"int"       { printf("INT ") ; return (INT) ; }
"float"     { printf("FLOAT ") ; return (FLOAT) ; }
"for"       { printf("FOR ") ; return (FOR) ; }
"while"     { printf("WHILE ") ; return (WHILE) ; }
"if"        { printf("IF ") ; return (IF) ; }
"else"      { printf("ELSE ") ; return (ELSE) ; }
";"     { printf("Q ") ; return (Q) ; }
"=="        { printf("EQUAL ") ; return (EQUAL) ; }
"<="        { printf("SMALLEReq ") ; return (SMALLER) ; }
">="        { printf("BIGGEReq ") ; return (BIGGER) ; }
"!="        { printf("NOTEQUAL ") ; return (NOTEQUAL) ; }
{id}        { printf("ID ") ; return (ID) ; }
{num}       { printf("NUM ") ; return (NUM) ; }
{string}    { printf("STRING ") ; return (STRING) ; }
<<EOF>>     { printf("EOF ") ; return (EOF); }
.       { printf(" lexical error in Line : %d \n ", line_num); exit(1); }
{Empty}+    { printf("EMPTY ") ; /* nothing */ }
[\(\)\{\}]  { return yytext[0] ; }
%%
int main(){
    yylex();
    return 0;
}

运行下面的例子：

mainclass Fibonacci {
    public static void main ( )
    {
        int first, second, i, tmp;
        first=0;
        second=1;
        i=0;
        while (i<10)
        {
            i=i+1;
            tmp=first+second;
            println (tmp);
            first=second;
            second=tmp;
        }
    }
}

输出：

MAINCLASS

在示例的开头添加一个 space :

lexical error in Line : 1

在示例的开头添加两个或多个 spaces，我们得到了这个输出：

EMPTY MAINCLASS

删除主类并保留第一个 space 和开头的标识符：

EMPTY  lexical error in Line : 1

Answer 1

flex 文件中规则的顺序很重要，这是一个特殊的例子。

你有规则：

.       { printf(" lexical error in Line : %d \n ", line_num); exit(1); }

这将匹配任何单个字符，包括匹配 space 个字符。

稍后在文件中（实际上，紧随其后），您有

{Empty}+    { printf("EMPTY ") ; /* nothing */ }

也可以匹配单个 space 字符。但是如果要匹配的token是单个space，那么第一个规则会获胜（正是因为当同一个token被多个规则匹配时，第一个规则会获胜）。

另一方面，如果有两个 space，那么模式 {Empty}+ 将匹配它们两个，而 . 将只匹配一个。在这种情况下，{Empty}+ 将获胜，因为最长的比赛总是获胜。

您应该始终将回退规则放在扫描仪描述的最后（可能 <<EOF>> 规则除外）。这不仅确保它们按预期工作，而且还制定了人们寻找它们的规则。

请注意，您的宏定义中还有其他各种错误，其中一些已在注释中注明。这些也会导致您的扫描仪拒绝有效输入，因此需要解决这些问题。

总的来说，我建议避免使用宏，除非您有非常复杂的模式，其中相同的子模式多次出现，这是它们的用例。 {Empty} 根本不是描述性的，因此它强制代码 reader（在本例中是我）搜索您的源文件以获取定义。您可以使用 Posix 字符 class [[:space:]]，这对任何有 Flex 经验的人来说都是 well-known。（它包括换行符，但您的换行符规则仅用于增加行号计数；您只需包含 %option yylineno 即可让 Flex 为您完成此操作。）

词法分析器 (Flex) 在 space 之后抛出词法错误，后跟标记（以及更多）

Lexical analyzer (Flex) throws lexical error after space following by tokens (and more)

c

regex

compiler-construction

lexical-analysis

flex-lexer