防止词法分析器读取文件中的第一个字符

Keep lexer from reading first character in file

我正在用 c 编写词法分析器,我注意到在读取测试代码时文件缓冲区有一个奇怪的字符,打印为 space。由于某种原因,词法分析器从缓冲区中读取它并将其视为 space.

测试文件:mo on

输出

Current character: " ", Length: 6, Pointer: 0
Current character: "m", Length: 6, Pointer: 1
Type:2 {
        Line:   1
        Pos:    0
        Number: 21646720
        Real:   21646720
        String: 'mo'
}

Current character: " ", Length: 6, Pointer: 3
Current character: "o", Length: 6, Pointer: 4
Type:2 {
        Line:   1
        Pos:    0
        Number: 21683576
        Real:   21683576
        String: 'o'
}

代码

static char lexer_look(lexer_t* lexer, size_t ahead) {
    if (lexer->len < lexer->ptr + ahead) {
        error_new(lexer->errors, 0, 0, "The lexer tried to index %d out of bounds %d", lexer->ptr + ahead, lexer->len);
        return;
    }
    return lexer->src[lexer->ptr + ahead];
}

static token_t* next_token(lexer_t* lexer) {
    token_t* token = NULL;

    while (token == NULL && can_adv(lexer, 1)) {
        const char c = lexer_look(lexer, 0);

        if (DEBUG)
            printf("Current character: \"%c\", Length: %d, Pointer: %d\n", lexer_look(lexer, 0), lexer->len, lexer->ptr);

        switch (c) {
        case '\n':
            new_line(lexer);
            lexer_adv(lexer, 1);
            break;
        case '\"':
            token = lexer_str(lexer);
            break;
        case '#':
            lexer_comment(lexer);
            break;
        default:
            if (isalpha(c) || c == '_')
                token = lexer_ident(lexer);
            else if (isspace(c))
                lexer_adv(lexer, 1);
            else
                break;
        }
    }

    return token;
}

c 应定义为 int,其值限制在 EOF..UCHAR_MAX 范围内,以便 isalpha()isspace() 具有可靠的行为。