奇怪的弹性行为

Question

我有一个 dice notation 的扫描仪，带有以下扫描仪

%option debug
%option noyywrap
%option yylineno

%{    
#include <limits.h>
#include "parser.h"
%} 

%%

[0-9]+      {
    errno = 0;
#ifdef DEBUG
    printf("Scanner: NUMBER %s (%i)\n", yytext, yyleng);
#endif
    long number =  strtol( yytext, NULL, 10);
    if ( errno != 0 && errno != ERANGE && number == 0 ) {
        printf("Error: incorrect number %s\n", yytext);
        exit(EXIT_FAILURE);
    }
    // we only accept integers
    if ( number > INT_MAX ) {
        printf("Error: %s is too large\n", yytext );
        exit(EXIT_FAILURE);
    }
    yylval.int_type = (int)number;
    return NUMBER;
}

\+          { return PLUS;    }

\-          { return MINUS;   }

["*"x]      { return TIMES;   }

\/          { return DIV;     }

d|D         {
#ifdef DEBUG
    printf("Scanner: DICE\n");
#endif
    return DICE;
}

f|F         { return FUDGE;   }

h|H         { return HIGH;    }

l|L         { return LOW;     }

"("         { return LPAREN;  }

")"         { return RPAREN;  }

"{"         {
#ifdef DEBUG
    printf("Scanner: LCURLY\n");
#endif
    return LCURLY;
}

"}"         {
#ifdef DEBUG
    printf("Scanner: RCURLY\n");
#endif
    return RCURLY;
}

">"         { return GT; }
">="        { return GE; }
"<"         { return LT; }
"<="        { return LE; }
"!="        { return NE; }
"<>"        { return NE; }
"%"         { return PERCENT; }

,           {
#ifdef DEBUG
    printf("Scanner: COMMA\n");
#endif
    return COMMA;
}

[[:blank:]] {
    /* ignore spaces */
#ifdef DEBUG
    printf("Scanner: BLANK\n");
#endif
}

.           { printf("Error: unknown symbol '%s'\n", yytext); exit(EXIT_FAILURE); }

%%

当我解析类似 4{3d6, 1d5} 的内容时，一切正常。但是对于 4{3d6,1d5} 扫描器有一个奇怪的行为并且错过了第一个大括号。

调试输出为

--accepting rule at line 20 ("43")
Scanner: NUMBER 43 (2)
--accepting rule at line 47 ("d")
Scanner: DICE
--accepting rule at line 20 ("641")
Scanner: NUMBER 641 (3)
--accepting rule at line 47 ("d")
Scanner: DICE

尽管 { 未包含在 [0-9]+ 中，但扫描仪正在将 4{3 匹配为 43。

由于不同的行为是由表达式中很久以后的空白触发的，我怀疑我在 space 处理中遗漏了一些东西，但我不明白为什么它会混淆整数匹配表达式的开头。

有什么提示吗？

Answer 1

如果 Unix shell 正在处理您的输入 4{3d6,1d5} 将扩展为 43d6 41d5。您的词法分析器完全忽略了空白，因此变成了 43d641d5，这就是您报告的内容（对您的部分 t运行取模？）。

我复制了你的代码，当我运行类似：

echo 4{3d6,1d5} | ./lex

我明白你的问题了。如果我运行:

echo '4{3d6,1d5}' | ./lex

那么一切都很好。如果我在文件中输入 4{3d6,1d5}，然后在文件中输入运行词法分析器，这也很好。

奇怪的弹性行为

Strange flex behaviour

lexical-analysis

flex-lexer