为什么我的字符串标记在我的 c++ bison 程序中最终减少时会换行？

Question

我正在为一项大学作业编写使用 flex 和 bison 的解析器。目前，我的目标是读取由整数、字符串及其运算符组成的表达式。整数运行良好——问题出在字符串上。在我运行程序之后，当我在控制台中键入一个字符串时，它应该打印回表达式的结果——在本例中，它是一个字符串类型，后跟字符串的值。所以如果我输入 "hello"，我应该返回“it:String="hello"”。问题是，在我最后一次减少 bison 文件时（其中 bison 使用开始变量的规则之一减少到开始变量），字符串值不知何故在它的末尾获得了一个换行符。所以字符串最终是 "hello\n"，因此 it:String="hello"\n 被打印到控制台。我已经通过解析跟踪确认字符串值在最后一次减少之前是正确的，然后它获得了换行符，我不明白为什么。我认为这个问题将通过一些代码片段变得非常清楚。

这里是 lex 文件的重要部分。最后一条规则是我 return 一个 STRING 标记。

%{
#include <iostream>
#include <string>
#include <stdlib.h>
#include "y.tab.h"
using namespace std;
void yyerror(char*);
%}

%%

0                       { yylval.iVal = atoi(yytext);
                          return INTEGER;
                        }

[1-9][0-9]*             { yylval.iVal = atoi(yytext);
                          return INTEGER;
                        }

[-+()~$^*/;\n]          return *yytext;
"=="                    return EQ;
"!="                    return NE;
"&&"                    return AND;
"||"                    return OR;
"\""[^"\""]*"\""        { yylval.strVal = yytext;
                          return STRING; }

这是 yacc 文件。应用规则 "program: program strExpr '\n' " 时，这是我将响应打印到控制台的地方。

%token EQ NE AND OR STRFIND
%token<iVal> INTEGER
%token<strVal> STRING
%left OR
%left AND
%left EQ NE
%left '+' '-'
%left '*' '/'
%left UNARY
%right '^'

%{
    #include <iostream>
    #include <cmath>
    #include <string>
    #define YYDEBUG 1
    using namespace std;
    void yyerror(char *);
    int yylex(void);
%}

%union {
    int iVal;
    char* strVal;
}

%type<iVal> intExpr
%type<strVal> strExpr

%printer {fprintf(yyoutput, "%s", $$);} strExpr

%%

program:
    program intExpr '\n'         {cout<<"it:Int="<<<<"\n";}
    | program strExpr '\n'       {cout<<"it:String="<<<<"\n";}
    | program intExpr ';'
    | program strExpr ';'
    | program intExpr ';' '\n'
    | program strExpr ';' '|n'
    | program '\n'
    | program ';'
    | program ';' '\n'
    | ;
expr:
    intExpr
    | strExpr

intExpr:
    INTEGER
    | '-' intExpr %prec UNARY          { $$ =  * (-1); }
    | '+' intExpr %prec UNARY          { $$ = ; }
    | intExpr '+' intExpr              { $$ =  + ; }
    | intExpr '*' intExpr              { $$ =  * ; }
    | intExpr '-' intExpr              { $$ =  - ; }
    | intExpr '/' intExpr              { if ( == 0) {
                                           yyerror(0);
                                           return 1;
                                       } else
                                           $$ =  / ; }
    | '(' intExpr ')'                  { $$ = ; }
    | intExpr '^' intExpr              { int i;
                                         int val = 1;
                                         for (i = 0; i < ; i++) {
                                             val = val * ;
                                         }
                                         $$ = val;
                                       }
    | intExpr EQ intExpr               { if ( == )
                                             $$ = 1;
                                         else
                                             $$ = 0;
                                       }
    | intExpr NE intExpr               { if ( != )
                                             $$ = 1;
                                         else
                                             $$ = 0;
                                       }
    | intExpr AND intExpr              { if ( != 0 &&  != 0)
                                             $$ = 1;
                                         else
                                             $$ = 0;
                                       }
    | intExpr OR intExpr               { if ( != 0 ||  != 0)
                                             $$ = 1;
                                         else
                                             $$ = 0;
                                       }
    | ;

strExpr:
    STRING                             
    | '(' strExpr ')'                  { $$ = ; }
    | ;

%%

void yyerror(char *s) {
    fprintf(stderr, "error\n");
}

int main(void) {
    yydebug = 1;
    yyparse();
    return 0;
}

这是示例的输出运行:

"hello"
it:String="hello"

1+1
it:Int=2
3+4
it:Int=7

it:String="hello" 之后的那个额外的换行符是什么？

这是解析跟踪，它告诉我在最后一次减少之前添加了换行符，但我不知道 为什么？

Starting parse
Entering state 0
Reducing stack by rule 10 (line 45):
-> $$ = nterm program ()
Stack now 0
Entering state 1
Reading a token: "hello"
Next token is token STRING ()
Shifting token STRING ()
Entering state 4
Reducing stack by rule 25 (line 93):
    = token STRING ()
-> $$ = nterm strExpr ("hello")
Stack now 0 1
Entering state 11
Reading a token: Next token is token '\n' ()
Shifting token '\n' ()
Entering state 29
Reducing stack by rule 2 (line 37):
    = nterm program ()
    = nterm strExpr ("hello"
)
    = token '\n' ()
it:String="hello"

-> $$ = nterm program ()
Stack now 0
Entering state 1
Reading a token:

非常感谢你的帮助。

Answer 1

yylval.strVal = yytext;

yytext 是指向静态缓冲区的指针。每次读取令牌时，缓冲区内容都会发生变化。

yylval.strVal = strdup(yytext);

这将消除换行符，但当然会引入内存泄漏。你需要照顾它。

为什么我的字符串标记在我的 c++ bison 程序中最终减少时会换行？

Why does my string token gain a newline at its final reduction in my c++ bison program?

c++

newline

bison

flex-lexer