是否可以使用 fgets 或 gets_s 正确读取空字符？

Question

假设我想从stdin读取，让用户输入包含空字符的字符串。这可以用像 fgets 或 gets_s 这样的字符串输入函数吗？或者我必须使用例如fgetc 或 fread?

有人 here 想要这样做。

Answer 1

对于fgets，是的。 fgets 被指定为就像通过重复 fgetc 并将结果字符存储到数组中一样。对于空字符没有特别规定，只是除了读取的字符外，最后（最后一个字符之后）存储一个空字符。

但是，要成功区分嵌入的空字符和终止符，需要做一些工作。

首先，用 '\n' 预填充缓冲区（例如使用 memset）。现在，当 fgets returns 时，在缓冲区中查找第一个 '\n'（例如使用 memchr）。

如果没有'\n'，fgets由于输出缓冲区被填满而停止，除了最后一个字节（空终止符）之外的所有内容都是从中读取的数据文件。
如果第一个 '\n' 紧接着是 '[=21=]'（空终止），fgets 由于到达换行符而停止，并且所有内容都通过它从文件中读取换行符。
如果第一个 '\n' 后面没有跟一个 '[=21=]'（要么在缓冲区的末尾，要么跟另一个 '\n'），那么 fgets 由于 EOF 或错误而停止，所有直到 '\n' 之前的字节（这必然是 '[=21=]'）但不包括它，都从文件中读取。

对于gets_s，我不知道，我强烈建议不要使用它。 Microsoft 的 Annex K“*_s”函数的唯一广泛实现的版本甚至不符合他们推入 C 标准附件的规范，据报道存在可能使这种方法不起作用的问题。

Answer 2

Is it possible to read null characters correctly using fgets or gets_s?

不是真的。

fgets() 未指定单独保留缓冲区的其余部分（在附加的 '[=14=]' 之后），因此为 post 分析预加载缓冲区不是指定工作。

在读取错误的情况下，缓冲区被指定为“数组内容是不确定”，但可以通过检查 return 值来消除这种情况的进一步关注。

如果不是这样，那么按照的建议进行各种测试就可以了。

  char buf[80];
  int length = 0;
  memset(buf, sizeof buf, '\n');
  // Check return value before reading `buf`.
  if (fgets(buf, sizeof buf, stdin)) {
    // The buffer should end with a [=10=] and 0 to 78 \n
    // Starting at the end, look for the first non-\n
    int i = sizeof buf - 1;
    while (i > 0) {
      if (buf[i] != '\n') {
        if (buf[i] == '[=10=]') {
          // found appended null
          length = i;
        } else {
          length = -1;  // indeterminent length
        }
        break;
      }
      i--;
    }
    if (i == 0) {
      // entire buffer was \n
      length = -1;  // indeterminent length
    }
  }

fgets() 只是不能完全胜任阅读可能包含 空字符 的 用户输入 的工作。它仍然是 C 中的一个漏洞。

我尝试编写此代码 fgets() Alternative，但我对它并不完全满意。

Answer 3

有一种方法可以可靠地检测 fgets(3) 读取的 [=16=] 个字符是否存在，但它的效率非常低。要可靠地检测到从输入流中读取了空字符，您必须首先用非空字符填充缓冲区。这样做的原因是 fgets() 通过在输入的末尾放置一个 [=16=] 来分隔它的输入并且（它应该）不写任何其他东西超过那个字符。

好吧，在用 [=20=]1 个字符填充输入缓冲区后，在您的缓冲区上调用 fgets()，然后从缓冲区的末尾开始搜索向后 [=16=] 字符：这是输入缓冲区的末尾。之前不需要检查字符（它不是 \n 的唯一情况是最后一个字符是 [=16=] 并且输入行比缓冲区中的 space 长对于一个完整的、以 nul 结尾的字符串，或者 fgets(3) 的伪造实现（有一些）。从一开始你就可以拥有尽可能多的 [=16=]，但别担心，它们是来自输入流。

如您所见，这是非常低效的。

#define NON_ZERO 1 #define BOGUS_FGETS -2 /* -1 is used by EOF */ /** * variant of fgets that returns the number of characters actually read */ ssize_t variant_of_fgets(const char *buffer, const size_t sz, FILE *in) { /* set buffer to non zero value */ memset(buffer, NON_ZERO, sz); /* do actual fgets */ if (!fgets(buffer, sizeof buffer, stdin)) { /* EOF */ return EOF; } char *p = buffer + sizeof buffer; while (--p >= buffer) if (!*p) break; /* if char is a [=10=] we're out */ /* ASSERT: (p < buffer)[not-found] || (p >= buffer)[found] */ if (p <= buffer) { /* Why do we check for p <= buffer ? * p must be > buffer, as if p == buffer * the implementation must be also bogus, because * the returned string should be an empty string "". * this can happen only with a bogus implementation * or an absurd buffer of length one (with only place for * the [=10=] char). Else, it must be a read character * (it can be a [=10=], but then it must have another [=10=] * behind, and p must be greater than this) */ return BOGUS_FGETS; } /* ASSERT: p > buffer && p < buffer + sz [found a [=10=]] * p points to the position of the last [=10=] in the buffer */ return p - buffer; /* this is the string length */ } /* variant_of_fgets */

例子

下面的示例代码将说明事情，首先是一个执行示例：

$ pru =============================================== <OFFSET> : pru.c:24:main: buffer initial contents 00000000 : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 : ................ 00000010 : e0 dd cf eb 02 56 00 00 e0 d7 cf eb 02 56 00 00 : .....V.......V.. 00000020 <OFFSET> : pru.c:30:main: buffer after memset 00000000 : fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa : ................ 00000010 : fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa : ................ 00000020 ^@^@^@^@^D^D <OFFSET> : pru.c:41:main: buffer after fgets(returned size should be 4) 00000000 : 00 00 00 00 00 fa fa fa fa fa fa fa fa fa fa fa : ................ 00000010 : fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa : ................ 00000020 =============================================== <OFFSET> : pru.c:24:main: buffer initial contents 00000000 : 00 00 00 00 00 fa fa fa fa fa fa fa fa fa fa fa : ................ 00000010 : fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa : ................ 00000020 <OFFSET> : pru.c:30:main: buffer after memset 00000000 : fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa : ................ 00000010 : fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa : ................ 00000020 ^D <OFFSET> : pru.c:41:main: buffer after fgets(returned size should be 0) 00000000 : fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa : ................ 00000010 : fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa : ................ 00000020 =============================================== pru.c:45:main: END OF PROGRAM $ _

生成文件

RM ?= rm -f targets = pru toclean += $(targets) all: $(targets) clean: $(RM) $(toclean) pru_objs = pru.o fprintbuf.o toclean += $(pru_objs) pru: $(pru_objs) $(CC) -o $@ $($@_objs)

pru.c

#include <stdio.h> #include <stdlib.h> #include <string.h> #include <stdint.h> #include "fprintbuf.h" #define F(fmt) __FILE__":%d:%s: " fmt, __LINE__, __func__ void line() { puts("==============================================="); } int main() { uint8_t buffer[32]; int eof; line(); do { fprintbuf(stdout, buffer, sizeof buffer, F("buffer initial contents")); memset(buffer, 0xfa, sizeof buffer); fprintbuf(stdout, buffer, sizeof buffer, F("buffer after memset")); eof = !fgets(buffer, sizeof buffer, stdin); /* search for the last [=13=] */ uint8_t *p = buffer + sizeof buffer; while (*--p && (p > buffer)) continue; if (p <= buffer) printf(F("BOGUS implementation")); fprintbuf(stdout, buffer, sizeof buffer, F("buffer after fgets(size should be %u)"), p - buffer); line(); } while(!eof); }

有辅助功能，打印缓冲区内容：

fprintbuf.h

/* $Id: fprintbuf.h,v 2.0 2005-10-04 14:54:49 luis Exp $ * Author: Luis Colorado <Luis.Colorado@HispaLinux.ES> * Date: Thu Aug 18 15:47:09 CEST 2005 * * Disclaimer: * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. */ #ifndef FPRINTBUF_H #define FPRINTBUF_H #ifdef __cplusplus extern "C" { #endif /* __cplusplus */ #include <stdio.h> #include <stdint.h> size_t fprintbuf ( FILE *f, /* fichero de salida */ const uint8_t *b, /* puntero al buffer */ size_t t, /* tamano del buffer */ const char *fmt, /* rotulo de cabecera */ ...); #ifdef __cplusplus } /* extern "C" */ #endif /* __cplusplus */ #endif /* FPRINTBUF_H */

fprintbuf.c

/* $Id: fprintbuf.c,v 2.0 2005-10-04 14:54:49 luis Exp $ * AUTHOR: Luis Colorado <licolorado@indra.es> * DATE: 7.10.92. * DESC: muestra un buffer de datos en hexadecimal y ASCII. */ #include <sys/types.h> #include <ctype.h> #include <stdio.h> #include <stdarg.h> #include "fprintbuf.h" #define TAM_REG 16 size_t fprintbuf( FILE *f, /* fichero de salida */ const uint8_t *b, /* puntero al buffer */ size_t t, /* tamano del buffer */ const char *fmt, /* rotulo de cabecera */ ...) { size_t off, i; uint8_t c; va_list lista; size_t escritos = 0; if (fmt) escritos += fprintf (f, "<OFFSET> : "); va_start (lista, fmt); escritos += vfprintf (f, fmt, lista); va_end (lista); escritos += fprintf (f, "\n"); off = 0; while (t > 0) { escritos += fprintf (f, "%08lx : ", off); for (i = 0; i < TAM_REG; i++) { if (t > 0) escritos += fprintf (f, "%02x ", *b); else escritos += fprintf (f, " "); off++; t--; b++; } escritos += fprintf (f, ": "); t += TAM_REG; b -= TAM_REG; off -= TAM_REG; for (i = 0; i < TAM_REG; i++) { c = *b++; if (t > 0) if (isprint (c)) escritos += fprintf (f, "%c", c); else escritos += fprintf (f, "."); else break; off++; t--; } escritos += fprintf (f, "\n"); } escritos += fprintf (f, "%08lx\n", off); return escritos; } /* fprintbuf */

Answer 4

Is it possible to read null characters correctly using fgets or gets_s?

正如其他一些答案所示，答案显然是 "Yes -- just barely." 同样，可以使用螺丝刀敲入钉子。同样，可以用 C 编写（相当于）BASIC 或 FORTRAN 代码。

但是 none 这些东西绝对不是个好主意。使用正确的工具来完成工作。如果要钉钉子，请使用锤子。如果您想编写 BASIC 或 FORTRAN，请使用 BASIC 解释器或 FORTRAN 编译器。如果您想读取可能包含空字符的二进制数据，请使用 fread（或者 getc）。不要使用 fgets，因为它的界面从来不是为这个任务设计的。

是否可以使用 fgets 或 gets_s 正确读取空字符？

Is it possible to read null characters correctly using fgets or gets_s?

c

input

null-character

例子

生成文件

pru.c

fprintbuf.h

fprintbuf.c