字符串大小错误的 malloc

malloc of wrong string size

我正在对 malloc 函数的使用做一些测试。 这是我使用的代码

#include <stdio.h>
#include <stdlib.h>

int main(){
    char * string = NULL;
    int size = 0;

    printf("Insert the number of caracters that u want to enter: ");
    scanf("%d", &size);

    string = malloc(size * sizeof(char));   //(char *)calloc(size, char)
    if(string != NULL){
        printf("Insert the string of %d size: ", size);
        scanf("%s", string);    //gets(string)

        printf("The string that u entered is \"%s\"\n", string);
    }
    
    free(string);

    return 0;
}

现在,如果我输入 size = 4,然后输入像“12345678910”这样的字符串,printf 的输出也是正确的,即使我输入 size 0 并输入一个长字符串,printf 的输出也是正确的。 为什么会这样?

首先,你不应该用0调用malloc。 根据规范,您将得到一个空指针或一个可以用 free 释放的唯一指针。 其次,scanf 并不安全。它不知道你给它的缓冲区有多大来存储字符串,如果字符串的大小大于缓冲区,它会有效地写入不属于你的内存。这是未定义的行为,可能会靠运气。而是使用像 fgets 这样的东西,它可以让你指定要读取的字符数。 此外,在为您的字符串分配缓冲区时,您应该考虑终止空字符。

正如其他人所说,如果 scanf 读取的输入太大,这将越界。越界写入、读取无效内存等都是未定义的行为。未定义允许代码可能崩溃或工作正常,如您在此处看到的那样。

对于怀疑可能存在此类问题的调试案例,我建议使用地址清理器(和其他清理器)。这些将通过将各种错误和未定义的行为转化为可重现的硬错误来确定程序中问题的确切位置。

$ gcc -fsanitize=address foo.c -static-libasan -g                                                                                                                 
$ ./a.out                                                                                                                                                            
Insert the number of caracters that u want to enter: 2
Insert the string of 2 size: aaa
=================================================================
==27085==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x602000000012 at pc 0x55efb85ade6a bp 0x7ffc2efa1b40 sp 0x7ffc2efa12c8
WRITE of size 4 at 0x602000000012 thread T0
    #0 0x55efb85ade69 in scanf_common(void*, int, bool, char const*, __va_list_tag*) (/home/ry/a.out+0x49e69)
    #1 0x55efb85aec38 in __isoc99_vscanf (/home/ry/a.out+0x4ac38)
    #2 0x55efb85aed3e in __interceptor___isoc99_scanf (/home/ry/a.out+0x4ad3e)
    #3 0x55efb866a115 in main /home/ry/foo.c:14
    #4 0x7f5902ecebf6 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21bf6)
    #5 0x55efb856b9d9 in _start (/home/ry/a.out+0x79d9)

0x602000000012 is located 0 bytes to the right of 2-byte region [0x602000000010,0x602000000012)
allocated by thread T0 here:
    #0 0x55efb86297b0 in malloc (/home/ry/a.out+0xc57b0)
    #1 0x55efb866a0d1 in main /home/ry/foo.c:11
    #2 0x7f5902ecebf6 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21bf6)

SUMMARY: AddressSanitizer: heap-buffer-overflow (/home/ry/a.out+0x49e69) in scanf_common(void*, int, bool, char const*, __va_list_tag*)
Shadow bytes around the buggy address:
  0x0c047fff7fb0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c047fff7fc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c047fff7fd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c047fff7fe0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c047fff7ff0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x0c047fff8000: fa fa[02]fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c047fff8010: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c047fff8020: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c047fff8030: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c047fff8040: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c047fff8050: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==27085==ABORTING

之所以有效,是因为大部分时间 malloc()ed 区域后面的内存未被使用并且为零,结束字符串。

scanf()只能取一个恒定的最大宽度,或者然后是“m”标志(%ms),正是针对这种情况:

The %c, %s, and %[ conversion specifiers shall accept an optional assignment-allocation character 'm', which shall cause a memory buffer to be allocated to hold the string converted including a terminating null character [...] The system shall allocate a buffer as if malloc() had been called.

参见: