C - 从文件末尾读取意外的随机字符
C - Unexpected random characters being read from end of file
我正在尝试从 csv 文件中读取逗号分隔的单词列表,但在处理由 C 读取时出现在文件末尾的无缝随机字符时遇到了问题。当我从列表中选择 add/remove 个单词时,文件末尾的字符似乎完全改变了。
这是文件中包含的内容:johnny,david,alan,rodney,bob,ronald,andrew,hola,goodbye
。即复制准确,没有意外的space或末尾的回车return。
这是程序读取的内容:
这是文中正在阅读的代码:
char* name;
FILE *fp;
char *fcontent;
int wordCount = 0;
char delim = ',';
long fsize;
bool end = false;
char guessedLetters[26];
int guessNum = 0;
int lives = 0;
for (int i = 0; i < 26; i++) {
guessedLetters[i] = '[=10=]';
}
fp = fopen(WORDS_FILENAME, "r");
if (fp == NULL) {
printf("Words File Exception: Exiting.");
return 1;
}
fseek(fp, 0L, SEEK_END);
fsize = ftell(fp);
fseek(fp, 0L, SEEK_SET);
fcontent = (char*)calloc(fsize, sizeof(char));
if (fcontent == NULL) {
printf("No words in file: Exiting.");
return 1;
}
fread(fcontent, sizeof(char), fsize, fp);
char *fcontent2 = malloc(strlen(fcontent + 1));
strcpy(fcontent2, fcontent);
fclose(fp);
单词被拆分成单词数组,流氓字符一直附加在最后一个单词的末尾,导致程序稍后出现很多问题。
这是将字符串拆分为数组的代码 wordArr
:
char wordArr[wordCount][15];
char *ptr2 = strtok(fcontent2, &delim);
int count = 0;
while (ptr2 != NULL) {
strcpy(wordArr[count], ptr2);
count++;
ptr2 = strtok(NULL, &delim);
}
也许如果不能完全省略读取的字符,可以在拆分过程中省略它们吗?
谢谢,杰克。
读取的数据不包含终止空字符。
您需要检查读取的字符数,然后"manually"设置终止空字符:
int cnt = fread(fcontent, sizeof(char), fsize, fp);
fcontent[cnt] = '[=10=]';
当然,好的做法是在将其用作数组索引之前检查 cnt
是否为负数(读取错误)。
首先,您以文本模式打开文件:
fp = fopen(WORDS_FILENAME, "r");
根据 C 标准 7.21.9.4 The ftell function, paragraph 2:
The ftell function obtains the current value of the file position indicator for the stream pointed to by stream. For a binary stream, the value is the number of characters from the beginning of the file. For a text stream, its file position indicator contains unspecified information, usable by the fseek function for returning the file position indicator for the stream to its position at the time of the ftell call; the difference between two such return values is not necessarily a meaningful measure of the number of characters written or read.
您不能在文本流上使用 ftell()
来判断可能读取了多少字节。
因此您必须以 binary 模式打开文件才能使用 ftell()
(但请参阅下面的注释):
fp = fopen(WORDS_FILENAME, "rb");
现在您的文件大小为:
fseek(fp, 0L, SEEK_END);
fsize = ftell(fp);
fseek(fp, 0L, SEEK_SET);
fcontent = (char*)calloc(fsize, sizeof(char));
但是,没有为任何 '[=16=]'
终止符留下空间,所以应该是
// no need to cast a void * in C, and sizeof(char)
// is **always** one by definition
fcontent = calloc(fsize + 1 , 1);
现在文件内容将有一个终止字符串。
关于二进制流 fseek()
的注意事项
根据 C 标准,使用 fseek()
到达二进制流的末尾实际上是未定义的行为。
根据 7.21.9.2 The fseek function, paragraph 3:
For a binary stream, the new position, measured in characters from the beginning of the file, is obtained by adding offset to the position specified by whence. The specified position is the beginning of the file if whence is SEEK_SET, the current value of the file position indicator if SEEK_CUR, or end-of-file if SEEK_END. A binary stream need not meaningfully support fseek calls with a whence value of SEEK_END.
Footnote 268偶数状态:
Setting the file position indicator to end-of-file, as with fseek(file, 0, SEEK_END), has undefined behavior for a binary stream (because of possible trailing null characters) or for any stream with state-dependent encoding that does not assuredly end in the initial shift state.
您可以使用 fseek(fp, 0L, SEEK_END);
的唯一原因是因为大多数操作系统扩展了 C 语言并实际定义了它来工作。
我正在尝试从 csv 文件中读取逗号分隔的单词列表,但在处理由 C 读取时出现在文件末尾的无缝随机字符时遇到了问题。当我从列表中选择 add/remove 个单词时,文件末尾的字符似乎完全改变了。
这是文件中包含的内容:johnny,david,alan,rodney,bob,ronald,andrew,hola,goodbye
。即复制准确,没有意外的space或末尾的回车return。
这是程序读取的内容:
这是文中正在阅读的代码:
char* name;
FILE *fp;
char *fcontent;
int wordCount = 0;
char delim = ',';
long fsize;
bool end = false;
char guessedLetters[26];
int guessNum = 0;
int lives = 0;
for (int i = 0; i < 26; i++) {
guessedLetters[i] = '[=10=]';
}
fp = fopen(WORDS_FILENAME, "r");
if (fp == NULL) {
printf("Words File Exception: Exiting.");
return 1;
}
fseek(fp, 0L, SEEK_END);
fsize = ftell(fp);
fseek(fp, 0L, SEEK_SET);
fcontent = (char*)calloc(fsize, sizeof(char));
if (fcontent == NULL) {
printf("No words in file: Exiting.");
return 1;
}
fread(fcontent, sizeof(char), fsize, fp);
char *fcontent2 = malloc(strlen(fcontent + 1));
strcpy(fcontent2, fcontent);
fclose(fp);
单词被拆分成单词数组,流氓字符一直附加在最后一个单词的末尾,导致程序稍后出现很多问题。
这是将字符串拆分为数组的代码 wordArr
:
char wordArr[wordCount][15];
char *ptr2 = strtok(fcontent2, &delim);
int count = 0;
while (ptr2 != NULL) {
strcpy(wordArr[count], ptr2);
count++;
ptr2 = strtok(NULL, &delim);
}
也许如果不能完全省略读取的字符,可以在拆分过程中省略它们吗?
谢谢,杰克。
读取的数据不包含终止空字符。
您需要检查读取的字符数,然后"manually"设置终止空字符:
int cnt = fread(fcontent, sizeof(char), fsize, fp);
fcontent[cnt] = '[=10=]';
当然,好的做法是在将其用作数组索引之前检查 cnt
是否为负数(读取错误)。
首先,您以文本模式打开文件:
fp = fopen(WORDS_FILENAME, "r");
根据 C 标准 7.21.9.4 The ftell function, paragraph 2:
The ftell function obtains the current value of the file position indicator for the stream pointed to by stream. For a binary stream, the value is the number of characters from the beginning of the file. For a text stream, its file position indicator contains unspecified information, usable by the fseek function for returning the file position indicator for the stream to its position at the time of the ftell call; the difference between two such return values is not necessarily a meaningful measure of the number of characters written or read.
您不能在文本流上使用 ftell()
来判断可能读取了多少字节。
因此您必须以 binary 模式打开文件才能使用 ftell()
(但请参阅下面的注释):
fp = fopen(WORDS_FILENAME, "rb");
现在您的文件大小为:
fseek(fp, 0L, SEEK_END);
fsize = ftell(fp);
fseek(fp, 0L, SEEK_SET);
fcontent = (char*)calloc(fsize, sizeof(char));
但是,没有为任何 '[=16=]'
终止符留下空间,所以应该是
// no need to cast a void * in C, and sizeof(char)
// is **always** one by definition
fcontent = calloc(fsize + 1 , 1);
现在文件内容将有一个终止字符串。
关于二进制流 fseek()
的注意事项
根据 C 标准,使用 fseek()
到达二进制流的末尾实际上是未定义的行为。
根据 7.21.9.2 The fseek function, paragraph 3:
For a binary stream, the new position, measured in characters from the beginning of the file, is obtained by adding offset to the position specified by whence. The specified position is the beginning of the file if whence is SEEK_SET, the current value of the file position indicator if SEEK_CUR, or end-of-file if SEEK_END. A binary stream need not meaningfully support fseek calls with a whence value of SEEK_END.
Footnote 268偶数状态:
Setting the file position indicator to end-of-file, as with fseek(file, 0, SEEK_END), has undefined behavior for a binary stream (because of possible trailing null characters) or for any stream with state-dependent encoding that does not assuredly end in the initial shift state.
您可以使用 fseek(fp, 0L, SEEK_END);
的唯一原因是因为大多数操作系统扩展了 C 语言并实际定义了它来工作。