%ms 和 %s scanf 之间的区别

difference between %ms and %s scanf

阅读 scanf 手册我遇到了这一行:

An optional 'm' character. This is used with string conversions (%s, %c, %[),

有人可以用简单的例子来解释它,说明这种选择在某些情况下的区别和需要吗?

C 标准未在 scanf() 格式中定义此类可选字符。

GNU lib C 确实以这种方式定义了一个可选的 a 指示符(来自 scanf 的手册页):

An optional a character. This is used with string conversions, and relieves the caller of the need to allocate a corresponding buffer to hold the input: instead, scanf() allocates a buffer of sufficient size, and assigns the address of this buffer to the corresponding pointer argument, which should be a pointer to a char * variable (this variable does not need to be initialized before the call).

The caller should subsequently free this buffer when it is no longer required. This is a GNU extension; C99 employs the a character as a conversion specifier (and it can also be used as such in the GNU implementation).

手册页的 NOTES 部分说:

The a modifier is not available if the program is compiled with gcc -std=c99 or gcc -D_ISOC99_SOURCE (unless _GNU_SOURCE is also specified), in which case the a is interpreted as a specifier for floating-point numbers (see above).

Since version 2.7, glibc also provides the m modifier for the same purpose as the a modifier. The m modifier has the following advantages:

  • It may also be applied to %c conversion specifiers (e.g., %3mc).

  • It avoids ambiguity with respect to the %a floating-point conversion specifier (and is unaffected by gcc -std=c99 etc.)

  • It is specified in the upcoming revision of the POSIX.1 standard.

位于 http://linux.die.net/man/3/scanf 的联机 linux 手册页仅将此选项记录为:

An optional 'm' character. This is used with string conversions (%s, %c, %[), and relieves the caller of the need to allocate a corresponding buffer to hold the input: instead, scanf() allocates a buffer of sufficient size, and assigns the address of this buffer to the corresponding pointer argument, which should be a pointer to a char * variable (this variable does not need to be initialized before the call). The caller should subsequently free(3) this buffer when it is no longer required.

Posix 标准在其 POSIX.1-2008 版中记录了此扩展(参见 http://pubs.opengroup.org/onlinepubs/9699919799/functions/fscanf.html):

The %c, %s, and %[ conversion specifiers shall accept an optional assignment-allocation character m, which shall cause a memory buffer to be allocated to hold the string converted including a terminating null character. In such a case, the argument corresponding to the conversion specifier should be a reference to a pointer variable that will receive a pointer to the allocated buffer. The system shall allocate a buffer as if malloc() had been called. The application shall be responsible for freeing the memory after usage. If there is insufficient memory to allocate a buffer, the function shall set errno to [ENOMEM] and a conversion error shall result. If the function returns EOF, any memory successfully allocated for parameters using assignment-allocation character m by this call shall be freed before the function returns.

使用这个扩展,你可以写:

char *p;
scanf("%ms", &p);

导致 scanf 从标准输入解析一个单词并分配足够的内存来存储其字符加上终止符 '[=48=]'。指向已分配数组的指针将存储到 p 中,而 scanf() 将存储到 return 1 中,除非无法从 stdin.[= 中读取非空白字符。 62=]

其他系统完全有可能将 m 用于类似的语义或完全用于其他目的。非标准扩展是不可移植的,在标准方法繁琐不切实际或完全不可能的情况下,应非常小心地使用,并记录在案。

请注意,使用 scanf() 的标准版本确实不可能解析任意大小的单词:

您可以解析一个具有最大长度的单词,并且应该在 '[=48=]':

之前指定要存储的最大字符数
char buffer[20];
scanf("%19s", buffer);

但这并没有告诉您在标准输入中还有多少字符可用于解析。在任何情况下,如果输入足够长,不超过最大字符数可能会引发未定义的行为,攻击者甚至可能使用特制输入来破坏您的程序:

char buffer[20];
scanf("%s", buffer); // potential undefined behavior,
                     // that could be exploited by an attacker.