防止词法分析器读取文件中的第一个字符
Keep lexer from reading first character in file
我正在用 c 编写词法分析器,我注意到在读取测试代码时文件缓冲区有一个奇怪的字符,打印为 space。由于某种原因,词法分析器从缓冲区中读取它并将其视为 space.
测试文件:mo on
输出
Current character: " ", Length: 6, Pointer: 0
Current character: "m", Length: 6, Pointer: 1
Type:2 {
Line: 1
Pos: 0
Number: 21646720
Real: 21646720
String: 'mo'
}
Current character: " ", Length: 6, Pointer: 3
Current character: "o", Length: 6, Pointer: 4
Type:2 {
Line: 1
Pos: 0
Number: 21683576
Real: 21683576
String: 'o'
}
代码
static char lexer_look(lexer_t* lexer, size_t ahead) {
if (lexer->len < lexer->ptr + ahead) {
error_new(lexer->errors, 0, 0, "The lexer tried to index %d out of bounds %d", lexer->ptr + ahead, lexer->len);
return;
}
return lexer->src[lexer->ptr + ahead];
}
static token_t* next_token(lexer_t* lexer) {
token_t* token = NULL;
while (token == NULL && can_adv(lexer, 1)) {
const char c = lexer_look(lexer, 0);
if (DEBUG)
printf("Current character: \"%c\", Length: %d, Pointer: %d\n", lexer_look(lexer, 0), lexer->len, lexer->ptr);
switch (c) {
case '\n':
new_line(lexer);
lexer_adv(lexer, 1);
break;
case '\"':
token = lexer_str(lexer);
break;
case '#':
lexer_comment(lexer);
break;
default:
if (isalpha(c) || c == '_')
token = lexer_ident(lexer);
else if (isspace(c))
lexer_adv(lexer, 1);
else
break;
}
}
return token;
}
c
应定义为 int
,其值限制在 EOF..UCHAR_MAX
范围内,以便 isalpha()
和 isspace()
具有可靠的行为。
我正在用 c 编写词法分析器,我注意到在读取测试代码时文件缓冲区有一个奇怪的字符,打印为 space。由于某种原因,词法分析器从缓冲区中读取它并将其视为 space.
测试文件:mo on
输出
Current character: " ", Length: 6, Pointer: 0
Current character: "m", Length: 6, Pointer: 1
Type:2 {
Line: 1
Pos: 0
Number: 21646720
Real: 21646720
String: 'mo'
}
Current character: " ", Length: 6, Pointer: 3
Current character: "o", Length: 6, Pointer: 4
Type:2 {
Line: 1
Pos: 0
Number: 21683576
Real: 21683576
String: 'o'
}
代码
static char lexer_look(lexer_t* lexer, size_t ahead) {
if (lexer->len < lexer->ptr + ahead) {
error_new(lexer->errors, 0, 0, "The lexer tried to index %d out of bounds %d", lexer->ptr + ahead, lexer->len);
return;
}
return lexer->src[lexer->ptr + ahead];
}
static token_t* next_token(lexer_t* lexer) {
token_t* token = NULL;
while (token == NULL && can_adv(lexer, 1)) {
const char c = lexer_look(lexer, 0);
if (DEBUG)
printf("Current character: \"%c\", Length: %d, Pointer: %d\n", lexer_look(lexer, 0), lexer->len, lexer->ptr);
switch (c) {
case '\n':
new_line(lexer);
lexer_adv(lexer, 1);
break;
case '\"':
token = lexer_str(lexer);
break;
case '#':
lexer_comment(lexer);
break;
default:
if (isalpha(c) || c == '_')
token = lexer_ident(lexer);
else if (isspace(c))
lexer_adv(lexer, 1);
else
break;
}
}
return token;
}
c
应定义为 int
,其值限制在 EOF..UCHAR_MAX
范围内,以便 isalpha()
和 isspace()
具有可靠的行为。