C:将 4 个字节写入大小为 3 的区域会溢出目标吗?

C: Writing 4 bytes into a region of size 3 overflows the destination?

我的简单C程序如下。最初,我用 3 个字符定义了变量 buf1

我对 2 个字符没有任何问题,例如 ABXY

user@linux:~/c# cat buff.c; gcc buff.c -o buff; echo -e '\n'; ./buff
#include <stdio.h>
#include <string.h>

int main() {
        char buf1[3] = "AB";
        printf("buf1 val:  %s\n", buf1);
        printf("buf1 addr: %p\n", &buf1);
        strcpy(buf1,"XY");
        printf("buf1 val:  %s\n", buf1);
}

buf1 val:  AB
buf1 addr: 0xbfe0168d
buf1 val:  XY
user@linux:~/c# 

不幸的是,当我添加 3 个字符时,例如 XYZ,我在编译程序时收到以下错误消息。

buff.c:8:2: warning: ‘__builtin_memcpy’ writing 4 bytes into a region of size 3 overflows the destination [-Wstringop-overflow=]
  strcpy(buf1,"XYZ");

XYZ不是3个字节吗?为什么错误消息说 4 bytes 而不是 3 bytes

user@linux:~/c# cat buff.c; gcc buff.c -o buff; echo -e '\n'; ./buff
#include <stdio.h>
#include <string.h>

int main() {
        char buf1[3] = "AB";
        printf("buf1 val:  %s\n", buf1);
        printf("buf1 addr: %p\n", &buf1);
        strcpy(buf1,"XYZ");
        printf("buf1 val:  %s\n", buf1);
}buff.c: In function ‘main’:
buff.c:8:2: warning: ‘__builtin_memcpy’ writing 4 bytes into a region of size 3 overflows the destination [-Wstringop-overflow=]
  strcpy(buf1,"XYZ");
  ^~~~~~~~~~~~~~~~~~


buf1 val:  AB
buf1 addr: 0xbfdb34fd
buf1 val:  XYZ
Segmentation fault
user@linux:~/c# 

如果您看一下 strcpy 的实现,您会发现它取决于 null character

char *strcpy(char *d, const char *s)
{
   char *saved = d;
   while (*s != '[=10=]')
   {
       *d++ = *s++;
   }
   *d = 0;
   return saved;
}

因此,对于 char arr[3],如果您尝试放置三个字符序列,则会覆盖 '[=14=]'。此外,它可能会永远迭代,从而导致堆栈溢出。 .

您忘记了 C 字符串是以 null 结尾的。由于隐式终止字节,sizeof "AB" 为 3,sizeof "XYZ" 为 4。 (字符串文字 "AB"的类型是char[3]"XYZ"的类型是char[4]。)

如果您没有为 buf1 指定任何长度,它的大小也会是 3 个字节长:

char buf1[] = "AB";  // here exactly the same as char buf1[3] = "AB";

内存布局为

buf1
  v
  +-------+-------+-------+
  |  [0]  |  [1]  |  [2]  |
  +-------+-------+-------+
  |  'A'  |  'B'  |  '[=11=]' |
  +-------+-------+-------+

现在,strcpy 复制终止空字符 (C11 7.24.2.3p2):

  1. The strcpy function copies the string pointed to by s2 (including the terminating null character) into the array pointed to by s1. If copying takes place between objects that overlap, the behavior is undefined.

这意味着总共复制了 4 个字节,但是只有 3 个字符有 space,因此代码具有 未定义的行为 并且编译器会生成诊断信息消息。 C11 7.1.4 Use of library functions p.2:

[...] If a function argument is described as being an array, the pointer actually passed to the function shall have a value such that all address computations and accesses to objects (that would be valid if the pointer did point to the first element of such an array) are in fact valid.[...]

在实际代码中,对 buf1[3] 的隐式访问实际上 无效

strcpy之后的内存布局:

buf1
  v
  +-------+-------+-------+-------+
  |  [0]  |  [1]  |  [2]  |  ???  |
  +-------+-------+-------+-------+
  |  'X'  |  'Y'  |  'Z'  |  '[=12=]' |
  +-------+-------+-------+-------+

警告来自 __builtin_memcpy 的原因是因为 C 编译器对这段代码进行了大量优化 - 它用已知长度的 memcpy 替换了已知长度的字符串的 strcpy因为 memcpy 会生成更高效的代码。


最后,您 可以 使用 strncpy 将 3 个字符放入 char buf1[3];,但缓冲区无法容纳终止空字符,因此它无法使用 printf("%s") 打印,但您可以通过指定小于或等于数组长度的显式字段宽度来打印它 - 但是打印出的值将被填充:

#include <stdio.h>
#include <string.h>

int main() {
    char buf1[3] = "AB";
    printf("buf1 val:  >%-3s<\n", buf1);
    printf("buf1 addr: %p\n", &buf1);
    strncpy(buf1, "XYZ", 3);
    printf("buf1 val:  >%-3s<\n", buf1);
}

并编译,运行它:

% gcc strncpy.c -Wall -Wextra
% ./a.out
buf1 val:  >AB <
buf1 addr: 0x7ffd7f6aecc5
buf1 val:  >XYZ<

但是在AB

之后打印了一个额外的space字符

7.24.2.3p2 on strcpy:

The strcpy function copies the string pointed to by s2 (including the terminating null character) into the array pointed to by s1.

3 个字符 + '\0' == 4 个字符

您还将获得 4 个:

printf("%zu\n", sizeof "ABC");

因为字符串文字基本上是具有静态存储的匿名字符数组文字,基本上等同于:

 static char const __anonymous[]="ABC"; /*the size gets inferred*/

 (char const[]){"ABC"};

(关于 const 的历史警告实际上并不存在,但出于所有意图和目的,您应该假装它存在)