yacc/bison 的分段错误

Question

我正在尝试为学校作业编写一个简单的 HTTP 请求解析器，但我遇到了无法摆脱的分段错误。我认为我的生产规则没问题。我在启用跟踪的情况下执行了 bison，它总是在解析我的 header:

的部分产生段错误

Reducing stack by rule 9 (line 59):
    = token ID ()
    = token COLON ()
    = token STRING ()
[4]    36661 segmentation fault (core dumped)  ./problem1 < input.txt

这是我的 request.l 文件的内容：

%option noyywrap
%{
    #include<stdio.h>
    #include "request.tab.h"
    char *strclone(char *str);
%}

num                                     [0-9]+(\.[0-9]{1,2})?
letter                                  [a-zA-Z]
letternum                               [a-zA-Z0-9\-]
id                                      {letter}{letternum}*
string                                  \"[^"]*\"
fieldvalue                              {string}|{num}

%%

(GET|HEAD|POST|PUT|DELETE|OPTIONS)      { yylval = strclone(yytext); return METHOD; }
HTTP\/{num}                             { yylval = strclone(yytext); return VERSION; }
{id}                                    { yylval = strclone(yytext); return ID; }
"/"                                     { return SLASH; }
"\n"                                    { return NEWLINE; }
{string}                                { yylval = strclone(yytext); return STRING; }
":"                                     { return COLON; }
[ \t\n]+                                       ;
. {
    printf("Unexpected: %c\nExiting...\n", *yytext);
    exit(0);
}

%%

char *strclone(char *str) {
    int len = strlen(str);
    char *clone = (char *)malloc(sizeof(char)*(len+1));
    strcpy(clone,str);
    return clone;
}

和我的 request.y 文件：

%{
#include<stdio.h>
#include<string.h>
#include<stdlib.h>
#define YYSTYPE char*

extern int yylex();
extern int yyparse();
extern FILE* yyin;

void yyerror(const char* s);
%}

%token METHOD
%token SLASH
%token VERSION
%token STRING
%token ID
%token COLON
%token NEWLINE

%%

REQUEST: METHOD URI VERSION NEWLINE HEADERS {
       printf("%s %s", , );
    }
;

URI: SLASH DIR {
        $$ = (char *)malloc(sizeof(char)*(1+strlen()+1));
        sprintf($$, "//%s", );
    }
;

DIR: ID SLASH {
        $$ = (char *)malloc(sizeof(char)*(strlen()+2));
        sprintf($$, "%s//", );
    }
    |ID {
        $$ = ;
    }
    | {
        $$ = "";
    }
;

HEADERS: HEADER {
        $$ = ;
    }
    |HEADER NEWLINE HEADERS {
        $$ = (char *)malloc(sizeof(char)*(strlen()+1+strlen()+1));
        sprintf($$, "%s\n%s", , );
    }
    |{
        $$ = "";
    }
;

HEADER: ID COLON STRING {
        $$ = (char *)malloc(sizeof(char)*(strlen()+1+strlen()+1));
        sprintf($$, "%s:%s", , );
    }
;

%%

void yyerror (char const *s) {
   fprintf(stderr, "Poruka nije tacna\n");
}

int main() {
    yydebug = 1;
    yyin = stdin;

    do {
        yyparse();
    } while(!feof(yyin));

    return 0;
}

这也是我的 input.txt 我作为输入传递的内容：

GET / HTTP/1.1
Host: "developer.mozzila.org"
Accept-language: "fr"

Answer 1

HEADER: ID COLON STRING {
    $$ = (char *)malloc(sizeof(char)*(strlen()+1+strlen()+1));
    sprintf($$, "%s:%s", , );
};

你不应该在计算组合字符串长度的表达式中使用 strlen() 吗？ strlen() 因为你使用只会 return 冒号字符串的长度应该是 1。如果你然后 sprintf 到太短的缓冲区，你访问它的长度后面的缓冲区。

Answer 2

在 request.y 中，您包含指令

#define YYSTYPE char*

所以在Bison生成的解析器代码中，yylval的类型是char*。但是该行没有插入 request.l。所以在 Flex 生成的扫描器代码中，yylval 有它的默认类型，int.

您可以通过将 YYSTYPE 的定义添加到您的 request.l 文件来解决这个问题，但是您在两个地方重复了相同的设置，这是灾难的根源。相反，使用 Bison 的声明语法：

%define api.value.type { char* }

（注意：这是 Bison 声明，而不是 C 预处理器定义，因此它与其他 Bison % 指令一起使用。）

这种解决方案的优点是Bison还将声明添加到它生成的头文件中。由于该文件在 request.l 中 #included，因此无需对您的扫描仪进行任何修改。

不幸的是，

C 允许将指针转换为整数类型，即使整数类型太窄而无法容纳整个地址，典型的 64 位平台和 8 字节指针就是这种情况， 4 字节 int。因此，在您的扫描器中，将编译器认为是四字节 int 的值设置为八字节指针意味着该值将被截断。因此，当解析器尝试将其用作地址时，您将遇到段错误。运气好的话。

大多数 C 编译器会就此截断向您发出警告——但前提是您告诉编译器您希望看到警告（-Wall 用于 clang 和 gcc）。使用 -Wall 进行编译始终很重要，即使在编译代码生成器的输出时也是如此。

您还需要更正 @JakobStark 指出的拼写错误。

yacc/bison 的分段错误

Segmentation fault with yacc/bison

c

parsing

yacc

bison

flex-lexer