使用 Bison 实现语法。语法控制流意外

Question

抽象语法。实际语法如下：

start ->  t1  V1;

V1 -->   t2   V1
       | t3   V2
       ;

V2 -->   t4
       | /* Empty */
       ;

当控件处于V2 并遇到令牌t3 时。控制回到 V1。好了到此为止。

但是控制并没有停在V1触发t3，而是回到开始，就是这个问题

实际语法。解释如下。

%{
#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>

void yyerror (char *s); 
extern FILE *yyin;

%}

%token MAIN   

%token INT_DT INT_LIT   /* t_INT_DT --> Integer Data type, i.e. "int"
                         * t_INT_LIT --> An integer, i.e. "5".
                         */

%token VAR              // A variable

%token SCANF  PRINTF    // For the printf and the scanf statements.

%token IF     ELSE      // For the if and else statements.
%token GOTO

%token OPRTR            // For the operator in an expression.

/* Begining of a basic block, BBS, Basic Block Statement.
 * BBS_ST = "<bb "
 * BBS_END1 ">:"
 * BBS_END2   ">;"
 * Example:   
 * <bb 2>:
 *      Statements;
 *      goto <bb 3>;
 */
%token BBS_ST   // "<bb "// it is "<bb ".
%token BBS_END1 // ">:" 
%token BBS_END2 // ">;" 

// Opening and closing Braces, i.e. { & }
%token OP_BR  // "{" 
%token CL_BR  // "}"

// For opening and closing Parantheses, ( & )
%token OP_PR  //"("  
%token CL_PR  //")"

%token EQUAL  // "="
%token COMMA  // ","       
%token NBSP   // "&"
%token SM_CLN // ";"       // For ";" 

%token RETURN           // For the return statement.

%start begin

%%

begin:   MAIN    main   ;

main:    OP_BR    functn   CL_BR   ;

functn:   INT_DT   VAR      SM_CLN   functn  /*Recursion to stay in functn */
        | BBS_ST   INT_LIT  BBS_END1  bb     
        ;

bb:    SCANF    var_List   CL_PR    SM_CLN  bb
    |  PRINTF   var_List   CL_PR    SM_CLN  bb
    |  VAR      EQUAL      expr     SM_CLN  bb
    |  IF       OP_PR      expr     CL_PR   bb
    |  ELSE     bb
    |  GOTO     BBS_ST     INT_LIT  BBS_END2
    |  RETURN   SM_CLN    
    |  /* Empty Rule. */  
    ;  /* bb has can't catch BBS_ST as first token, thus it must return
        * to the calling rule and functn can catch BBS_ST. */

var_List:    COMMA   VAR   var_List 
          |  /* Empty Rule. */ 
          ;

expr:     VAR      expr2 
      |   INT_LIT  expr2  
      ;

expr2:    OPRTR    expr3  
       |  /* Empty Rule */
       ;

expr3:    VAR     
       |  INT_LIT  
       ;

%%
int main (int argc, char *argv[])
{
    #if YYDEBUG == 1
        extern int yydebug;
        yydebug = 1;
    #endif

    if (argc == 2)
    {
        yyin = fopen (argv [1], "r");
        if (yyin == NULL)
            perror (argv [1]); 
    }

    yyparse ();

    fclose (yyin);
    return 0;    
}


void yyerror (char *s)
{
    printf ("Error: %s\n", s);
}

到目前为止，答案中建议的想法已经实现，但这没有帮助。

begin:  MAIN main ; 

main:   OP_BR   functn_Decls   CL_BR 
        ; 

functn_Decls:   functn_Decls INT_DT VAR SM_CLN 
              | functn 
              ; 

functn:    functn BBS_ST INT_LIT BBS_END1 
        |  bb 
        ; 

bb:    bb   stmnt 
    |  /* Empty Rule */ 
    ; 

stmnt:   t1  // terminals only. 
       ; 

var_List:   t2 // terminal just for illustration. 
            ;

expr:  t3 // terminal just for illustration.
       ;

导致输入错误：

main ()
{
  <bb 2>:
  goto <bb 4>;

  <bb 3>:
}

错误：

在 <bb 时，令牌 BBS_ST 被触发。此时控件在bb中，喜欢bb --> Waiting for a token 由于 bb 中没有规则以 BBS_ST 开头，因此它 returns 调用规则。

现在调用规则为functn，其中有一条规则以BBS_ST开头。问题是，这条规则没有被调用。

相反，控件在 P1 到达 functn 的父级。如果在 P1 我添加 BBS_ST，它会捕获此标记。

请指出为什么会这样？

对应的Flex文件：

%{
    // Include the file
    #include "Parse_One_Line2.tab.h"
    extern YYSTYPE  yylval;
    extern char    *yytext;
%}

LB   ^[ \t]*
CH   [a-zA-Z]
DI   [0-9]

%%

{LB}main[ ]*\([.]*\)    return MAIN; // the main() call.

{LB}int[ ]              return INT_DT;

[+-]?{CH}({CH}{DI}_)*   return VAR; 

[+-]?[0-9]+             return INT_LIT;  

{LB}scanf[ ]?\(\".*\"   return SCANF;

{LB}printf[ ]?\(\".*\"  return PRINTF;

{LB}if[ ]               return IF; // If statement.

{LB}else[ ]             return ELSE; // If statement.

{LB}goto[ ]             return GOTO;

{LB}return              return RETURN;

\{CLOBBER\};            ;// Ignore it.

"<bb "                  return BBS_ST;  // next integer is a BB Index.

">:"                    return BBS_END1; // token Greater than Colon.

">;"                    return BBS_END2;

^;;[^\n]*[\n]           ; // Ignore it, comment.

"+"|"-"|"*"|"/"|"%"|"=="|">="|"<="|"<"|">"|"!=" return OPRTR; 

^[{]                    return OP_BR; // It is "{".

^[}]                    return CL_BR; // It is "}".

"("                     return OP_PR; // Paranthesis Open

")"                     return CL_PR; // Paranthesis Close


";"                     return SM_CLN;

"="                     return EQUAL;

"&"                     return NBSP;

","                     return COMMA;

[\n]                    ; /* Skip these characters. */

.                       ; /* Skip.   */

%%


int yywrap (void)
{
    return 1;
}

使用 YYDEBUG 标志的输出文件。

Starting parse
Entering state 0
Reading a token: Next token is token MAIN ()
Shifting token MAIN ()
Entering state 1
Reading a token: Next token is token OP_BR ()
Shifting token OP_BR ()
Entering state 3
Reading a token: Next token is token INT_DT ()
Shifting token INT_DT ()
Entering state 6
Reading a token: Next token is token VAR ()
Shifting token VAR ()
Entering state 9
Reading a token: Next token is token SM_CLN ()
Shifting token SM_CLN ()
Entering state 12
Reading a token: Next token is token BBS_ST ()
Shifting token BBS_ST ()
Entering state 7
Reading a token: Next token is token INT_LIT ()
Shifting token INT_LIT ()
Entering state 10
Reading a token: Next token is token BBS_END1 ()
Shifting token BBS_END1 ()
Entering state 13
Reading a token: Next token is token GOTO ()
Shifting token GOTO ()
Entering state 20
Reading a token: Next token is token BBS_ST ()
Shifting token BBS_ST ()
Entering state 29
Reading a token: Next token is token INT_LIT ()
Shifting token INT_LIT ()
Entering state 38
Reading a token: Next token is token BBS_END2 ()
Shifting token BBS_END2 ()
Entering state 47
Reducing stack by rule 10 (line 76):
    = token GOTO ()
    = token BBS_ST ()
    = token INT_LIT ()
    = token BBS_END2 ()
-> $$ = nterm bb ()
Stack now 0 1 3 6 9 12 7 10 13
Entering state 22
Reducing stack by rule 4 (line 68):
    = token BBS_ST ()
    = token INT_LIT ()
    = token BBS_END1 ()
    = nterm bb ()
-> $$ = nterm functn ()
Stack now 0 1 3 6 9 12
Entering state 14
Reducing stack by rule 3 (line 67):
    = token INT_DT ()
    = token VAR ()
    = token SM_CLN ()
    = nterm functn ()
-> $$ = nterm functn ()
Stack now 0 1 3
Entering state 8
Reading a token: Next token is token BBS_ST ()
Error: popping nterm functn ()
Stack now 0 1 3
Error: popping token OP_BR ()
Stack now 0 1
Error: popping token MAIN ()
Stack now 0
Cleanup: discarding lookahead token BBS_ST ()
Stack now 0

Answer 1

我重新整理了您的一些规则。问题是当 functn 看到 bb 时，它只期望一个。因此，当您的 GOTO 达到时，预计不会有更多令牌。允许 functns 出现在 bb 语句之后应该可以解决此问题并为您提供所需的行为。

functn: INT_DT VAR ";" functn
      | bb_stmt functn
      | bb_stmt
      ;

bb_stmt: BBS_ST /*P2*/ INT_LIT ">:" bb
       ;

如果你想继续强制语句只在函数的顶部，你可以在上面做一些这样的事情：

main:    "{"     functn_decls /*P1*/   "}"   ;

functn_decls: INT_DT VAR ";" functn_decls
            | functn
            ;

functn: bb_stmt functn
      | bb_stmt
      ;

Answer 2

您的语法与您认为的不一样。（在这里，我只使用简化语法，因为它比遍历整个语法更简单，而且如前所述，原理是相同的。）

start →  t1  V1

V1    →  t2   V1
      |  t3   V2

V2    →  t4
      |  /* Empty */

让我们问一个简单的问题：V2之后可以做什么？因为语法中唯一使用V2的地方是在产生式V1 → t3 V2，答案只能是：与 V1.

完全相同的标记

那么V1接下来会发生什么？同样，只有两个地方使用了 V1，而且都在作品的结尾。一个是递归的，所以对follow set没有贡献，另一个在start → t1 V1里。所以我们可以得出结论，唯一可以跟在 V1 之后的标记（因此也是唯一可以跟在 V2 之后的标记）是输入结束标记，通常写成 $.

所以t3不能跟在V1或V2后面，结果就是句子：

t1 t3 t3

不是语言的一部分；第二个 t3 无法解析。

更笼统地说，您似乎在尝试分析生成的解析器的行为，就好像它是自上而下的递归下降解析器一样。 Bison 不生成自上而下的解析器；它生成自下而上的 LALR(1) 解析器（默认情况下），并且您的 "control flow" 概念与 LR(k) 算法中发生的任何事情都不匹配。

此外，LR(1) 文法在左递归方面没有任何问题，因此无需像对自上而下的解析器生成器那样破坏文法。你会发现左递归效果更好，因为它实际上反映了语言的结构。如果你分析

statement1 ; statement2 ; statement3 ;

使用右递归语法：

Program → ε | Statement ; Program

您会发现 statement 减少是从右到左应用的，从您的角度来看这可能是向后的。那是因为 LR 解析器产生最右推导，所以它在到达输入末尾之前不会开始进行归约：

  statement1 ; statement2 ; statement3 ;
→ statement1 ; statement2 ; statement3 ; Program (Program → ε)
→ statement1 ; statement2 ; Program              (Program → Statement ; Program)
→ statement1 ; Program                           (Program → Statement ; Program)
→ Program                                        (Program → Statement ; Program)

很可能您真正想要的更像是以下内容（回到您简化的语法之类的内容）：

S  → t1 | S V1 ';'
V1 → t2 | t3 V2
V2 → t4 | /* Empty */

使 S 成为 t1 后跟 t2、t3 或 t3 t4 中的任意数字（可能为 0），每个后跟一个分号。

使用 Bison 实现语法。语法控制流意外

Implementation of grammar using Bison. Grammar control flow unexpected

c

grammar

parsing

bison

flex-lexer