词法分析器中的 switch 语句出现问题

Trouble with a switch statement in a lexical analyzer

我正在尝试用 C 编写一个词法分析器,我已经尝试逐步查看代码以查看问题出在哪里,但我看不到。此代码从文件中读取一行,假设文件中没有其他内容的第一行。所以我用 "a = (b + 2) * c".

测试了它

它可以工作,并打印出 a=,但随后什么也没有。我发现我的 lookup() 函数中的 switch 语句有问题,因为它似乎可以正常处理 lex()UNKNOWN 以外的任何内容。任何见解都会有所帮助和赞赏。

#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>

int charClass;
char lexeme[100];
char nextChar;
int lexLen;
int token;
int nextToken;
FILE *fp;

void addChar();
void getChar();
void getNonBlank();
int lex();

#define LETTER 0
#define DIGIT 1
#define UNKNOWN 99

#define INT_LIT 10
#define IDENT 11
#define ASSIGN_OP 20
#define ADD_OP 21
#define SUB_OP 22
#define MULT_OP 23
#define DIV_OP 24
#define LEFT_PARENT 25
#define RIGHT_PARENT 26

int main(int argc, char *argv[])
{
    fp = fopen(argv[1], "r");

    if (fopen == NULL)
        printf("File can not be opened");
    else {
        getChar();

        while (nextToken != EOF) {
            lex();
        }
    }
    return 0;
}

int lookup(char ch) {
    switch (ch) {
    case '(':
        addChar();
        nextToken = LEFT_PARENT;
        break;
    case ')':
        addChar();
        nextToken = RIGHT_PARENT;
        break;
    case '+':
        addChar();
        nextToken = ADD_OP;
        break;
    case '-':
       addChar();
        nextToken = SUB_OP;
        break;
    case '*':
        addChar();
        nextToken = MULT_OP;
        break;
    case '/':
        addChar();
        nextToken = DIV_OP;
        break;
    default:
        addChar();
        nextToken = EOF;
    }
    return nextToken;
}

void addChar() {
    if (lexLen <= 98) {
        lexeme[lexLen++] = nextChar;
        lexeme[lexLen] = 0;
    } else
        printf("Error- lexele is too long...\n");
}

void getChar() {
    if ((nextChar = getc(fp)) != EOF) {
        if (isalpha(nextChar))
            charClass = LETTER;
        else if(isdigit(nextChar))
             charClass = DIGIT;
        else
            charClass = UNKNOWN;
    } else
        charClass =EOF;
}

void getNonBlank() {
    while (isspace(nextChar))
        getChar();
}

int lex() {
    lexLen = 0;
    getNonBlank();
    switch (charClass) {
    case LETTER:
        addChar();
        getChar();
        while (charClass == LETTER || charClass == DIGIT) {
            addChar();
            getChar();
        }
        nextToken = IDENT;
        break;
    case DIGIT:
        addChar();
        getChar();
        while (charClass == DIGIT) {
            addChar();
            getChar();
        }
        nextToken = INT_LIT;
        break;
    case UNKNOWN:
        lookup(nextChar);
        getChar();
        break;
    case EOF:
        nextToken = EOF;
        lexeme[0] = 'E';
        lexeme[1] = 'O';
        lexeme[2] = 'F';
        lexeme[3] = 0;
    }
    printf("Next token is :%d, next lexeme is %s\n", nextToken, lexeme);
    return nextToken;
}

首先要注意的是,当你得到一个 char 并想将它与 EOF 进行比较时,你必须将 char 保存在 int 中,而不是在 char 中,所以

void getChar()
{
    if((nextChar=getc(fp))!=EOF)
        {
        if(isalpha(nextChar))
            charClass=LETTER;
            else if(isdigit(nextChar))
                charClass=DIGIT;
        else
            charClass=UNKNOWN;
        }   
    else
        charClass=EOF;
}

没有很好地处理 EOF 的情况,因为 nextChar 是一个 char,可以:

void getChar()
{
    if((charClass=getc(fp))!=EOF)
        {
      nextChar = charClass;
        if(isalpha(nextChar))
            charClass=LETTER;
            else if(isdigit(nextChar))
                charClass=DIGIT;
        else
            charClass=UNKNOWN;
    }
}

第二个评论是你忘记管理大小写 '=',所以我认为在 lookup(char ch) 中你必须添加大小写:

case '=':addChar();
       nextToken=ASSIGN_OP;
  break;

这就是为什么你在读到“=”后停下来的原因。

如果我进行这些更改:

pi@raspberrypi:/tmp $ gcc -g -Wextra q.c
pi@raspberrypi:/tmp $ cat in
a = (b + 2) * c
pi@raspberrypi:/tmp $ ./a.out in
Next token is :11, next lexeme is a
Next token is :20, next lexeme is =
Next token is :25, next lexeme is (
Next token is :11, next lexeme is b
Next token is :21, next lexeme is +
Next token is :10, next lexeme is 2
Next token is :26, next lexeme is )
Next token is :23, next lexeme is *
Next token is :11, next lexeme is c
^C

我必须终止执行,因为程序循环,这是因为在 getNonBlank()EOF 情况未被管理,所以 :

void getNonBlank()
{
    while((charClass != EOF) && isspace(nextChar))
        getChar();
}

修改之后:

pi@raspberrypi:/tmp $ ./a.out in
Next token is :11, next lexeme is a
Next token is :20, next lexeme is =
Next token is :25, next lexeme is (
Next token is :11, next lexeme is b
Next token is :21, next lexeme is +
Next token is :10, next lexeme is 2
Next token is :26, next lexeme is )
Next token is :23, next lexeme is *
Next token is :11, next lexeme is c
Next token is :-1, next lexeme is EOF

正如 chqrlie 在评论中所说,也将 if(fopen == NULL) 替换为 if (fp == NULL)