在 flex 中显示标记之前显示行号

Question

我正在使用 flex 读取 cminus 文件的内容，然后以以下格式显示内容： : 代币我可以显示标记，但是当我尝试显示行号时，我只能查看行号。我的 flex 文件：

%option noyywrap
%option yylineno
%{
#include <stdio.h>
int lineNo = 1;
%}
line ^.*\n
letter [a-zA-Z]
digit  [0-9]


%x IN_COMMENT
%%

{line} {printf("%d:\n", lineNo++);} 
{digit}+    {
            printf("found NUM token\n");
            }
"while"    {
            printf("found WHILE token\n");
            }
"else"    {
            printf("found ELSE token\n");
            }
"if"    {
            printf("found IF token\n");
            }
"return"    {
            printf("found RETURN token\n");
            }
"void"    {
            printf("found VOID token\n");
            }
"int"    {
            printf("found INT token\n");
            }
"+"    {
            printf("found PLUS token\n");

            }
"-"    {
            printf("found MINUS token\n");

            }
"*"    {
            printf("found TIMES token\n");

            }
"/"    {
            printf("found OVER token\n");

            }
"<"    {
            printf("found LT token\n");

            }
"<="    {
            printf("found LTEQ token\n");
            }
">"    {
            printf("found GT token\n");
            }
">="    {
            printf("found GTEQ token\n");
            }
"=="    {
            printf("found EQ token\n");
            }
"!="    {
            printf("found NEQ token\n");
            }
"="    {
            printf("found ASSIGN token\n");
            }
";"    {
            printf("found SEMI token\n");

            }
","    {
            printf("found COMMA token\n");

            }
"("    {
            printf("found LPAREN token\n");

            }
")"    {
            printf("found RPAREN token\n");

            }
"["    {
            printf("found LBRACKET token\n");

            }
"]"    {
            printf("found RBRACKET token\n");

            }
"{"    {
            printf("found LBRACE token\n");

            }
"}"    {
            printf("found RBRACE token\n");

            }


[ \t]+
<INITIAL>{
"/*"              BEGIN(IN_COMMENT);
}
<IN_COMMENT>{
"*/"      BEGIN(INITIAL);
[^*\n]+   // eat comment in chunks
"*"       // eat the lone star
\n        yylineno++;
}

{letter}{letter}*  {
            printf("found ID token\n");
            }
. {printf("Unrecognized character");}
%%

int main( int argc, char **argv )
{
++argv, --argc;
if ( argc > 0 )
     yyin = fopen( argv[0], "r" );
else
     yyin = stdin;
yylex();
}

我的输入文件：

/* Sample program
  in CMinus language -
  computes factorial
*/
void main (void)
{
   int x;
   int whileimatit;

   /* read x; { input an integer } */
   x = input();

   /* if x > 0 then { don't compute if x <= 0 } */
   if ( x > 0 ) {
      /*     fact := 1; */
      whileimatit = 1;
      /*   repeat */
      while (x > 0)
      {
     /*     fact := fact * x; */
     whileimatit = whileimatit * x;
     /*     x := x - 1 */
     x = x - 1;
     /*   until x = 0; */
      }
      /* write fact  { output factorial of x } */
      output(whileimatit);

   /* end */
   }
}

我的输出：

1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:
18:
19:
20:
21:
22:
23:
24:
25:
26:
27:
28:
29:
30:
31:

期望的输出：

1:
2:
3:
4:
5:
found VOID token
found ID token
found LPAREN token
found VOID token
found RPAREN token
6:
found LBRACE token
7:
found INT token
found ID token
found SEMI token
8:
found INT token
found ID token
found SEMI token
9:
10:
11:
found ID token
found ASSIGN token
found ID token
found LPAREN token
found RPAREN token
found SEMI token
12:
13:
14:
found IF token
found LPAREN token
found ID token
found GT token
found NUM token
found RPAREN token
found LBRACE token
15:
16:
found ID token
found ASSIGN token
found NUM token
found SEMI token
17:
18:
found WHILE token
found LPAREN token
found ID token
found GT token
found NUM token
found RPAREN token
19:
found LBRACE token
20:
21:
found ID token
found ASSIGN token
found ID token
found TIMES token
found ID token
found SEMI token
22:
23:
found ID token
found ASSIGN token
found ID token
found MINUS token
found NUM token
found SEMI token
24:
25:
found RBRACE token
26:
27:
found ID token
found LPAREN token
found ID token
found RPAREN token
found SEMI token
28:
29:
30:
found RBRACE token
31:
found RBRACE token

如果我删除以下行：

{line} {printf("%d:\n", lineNo++);}

我得到以下输出：

found VOID token
found ID token
found LPAREN token
found VOID token
found RPAREN token

found LBRACE token

found INT token
found ID token
found SEMI token

found INT token
found ID token
found SEMI token



found ID token
found ASSIGN token
found ID token
found LPAREN token
found RPAREN token
found SEMI token



found IF token
found LPAREN token
found ID token
found GT token
found NUM token
found RPAREN token
found LBRACE token


found ID token
found ASSIGN token
found NUM token
found SEMI token


found WHILE token
found LPAREN token
found ID token
found GT token
found NUM token
found RPAREN token

found LBRACE token


found ID token
found ASSIGN token
found ID token
found TIMES token
found ID token
found SEMI token


found ID token
found ASSIGN token
found ID token
found MINUS token
found NUM token
found SEMI token


found RBRACE token


found ID token
found LPAREN token
found ID token
found RPAREN token
found SEMI token



found RBRACE token

found RBRACE token

我无法将行号与输出一起打印出来。有人可以帮忙吗？

Answer 1

你定义line为

line ^.*\n

表示匹配整行。这就是将会发生的事情。每行都将作为 line 标记进行匹配，并且不会使用其他规则。

您可以放弃 line 定义 [注 1]，并使用 pattern/action 规则：

\n    {printf("%d:\n", lineNo++);}

但是，这将在行尾而不是行首触发。另外，它不会在解析的最开始触发，它会在最后一行的末尾触发，这也是不可取的。

如果您只是想实现调试输出，我强烈建议您使用 Flex 的 built-in 跟踪工具，在构建您的扫描器时通过 -d 选项启用。您可能还想使用 %option yylineno 选项，它会告诉 flex 自动跟踪输入行号。（让 flex 来做这件事比自己做要健壮得多，而且显然工作量要少一些。）

如果真的想在每行的开头输出行号，可以使用开始条件结合yyless()重新扫描。这是一个最小的例子：

%option nodefault noyywrap noinput nounput
%option yylineno

%x BOL
%%
                BEGIN(BOL);                /* Note 2 */
<BOL>.|\n       { yyless(0);               /* Note 3 */
                  printf("Line %d:", yylineno);
                  BEGIN(INITIAL);
                }
\n              putchar('\n'); BEGIN(BOL); /* Note 4 */

  /* Rest of the rules go here. The following is minimal. */
[[:blank:]]+    ;
[^[:blank:]\n]+ printf(" word: '%s'", yytext);

备注：

事实上，即使不是全部，您也可以放弃大部分定义。 [0-9] 比 {digit} 可读性差吗？我会说 "No"，因为它有明确的含义，而 digit 可能被定义为任何东西。更清楚的是 built-in 字符 class [[:digit:]].
（第 6 行）每次调用 yylex 时执行第一条规则之前的任何操作。在这种情况下，我们只调用 yylex 一次，这样我们就可以侥幸逃脱；如果我们真的要返回令牌，那么从驱动程序中设置初始状态会更方便。或者只使用 INITIAL 表示 start-of-line 状态，以及一些其他正常操作的启动条件。
（第 7-9 行）当我们处于 BOL 状态时，我们响应任何后续字符，包括换行符（表示空行）。如果我们在 EOF，则不会执行此规则，因为在这种情况下没有后续字符。响应是从令牌中删除我们刚刚读取的字符（这使令牌为空），然后打印指示我们在哪一行的消息。最后，我们切换到正常扫描状态，从该行的第一个字符开始（因为yyless）。

尝试使用 ^ 锚来执行此操作很诱人，但这行不通。首先，flex 不允许空模式，因此锚点本身不是有效模式。还是需要匹配后面的字符。但是，如果不再次触发锚点规则就无法重新扫描该字符，因为重新扫描时该字符仍将位于一行的开头。因此使用开始条件。
(第11行)当我们打一个换行符时，我们需要改变到BOL状态，这样下一个字符（如果有的话）就会触发该行的输出数字。由于此示例在与行号相同的行上打印标记，因此我们还需要将换行符发送到输出以终止当前行。

在 flex 中显示标记之前显示行号

Display line numbers before displaying tokens in flex

compiler-construction

flex-lexer

备注：