单个 malloc 中的多个结构调用未定义的行为？

Question

来自Use the correct syntax when declaring a flexible array member 它说当 malloc 用于 header 和灵活数据时 data[1] 被黑入 struct,

This example has undefined behavior when accessing any element other than the first element of the data array. (See the C Standard, 6.5.6.) Consequently, the compiler can generate code that does not return the expected value when accessing the second element of data.

我查阅了 C 标准 6.5.6，但看不出这将如何产生未定义的行为。我使用了一种我喜欢的模式，其中 header 隐含地后跟数据，使用相同类型的 malloc,

#include <stdlib.h> /* EXIT malloc free */
#include <stdio.h>  /* printf */
#include <string.h> /* strlen memcpy */

struct Array {
    size_t length;
    char *array;
}; /* +(length + 1) char */

static struct Array *Array(const char *const str) {
    struct Array *a;
    size_t length;
    length = strlen(str);
    if(!(a = malloc(sizeof *a + length + 1))) return 0;
    a->length = length;
    a->array = (char *)(a + 1); /* UB? */
    memcpy(a->array, str, length + 1);
    return a;
}

/* Take a char off the end just so that it's useful. */
static void Array_to_string(const struct Array *const a, char (*const s)[12]) {
    const int n = a->length ? a->length > 9 ? 9 : (int)a->length - 1 : 0;
    sprintf(*s, "<%.*s>", n, a->array);
}

int main(void) {
    struct Array *a = 0, *b = 0;
    int is_done = 0;
    do { /* Try. */
        char s[12], t[12];
        if(!(a = Array("Foo!")) || !(b = Array("To be or not to be."))) break;
        Array_to_string(a, &s);
        Array_to_string(b, &t);
        printf("%s %s\n", s, t);
        is_done = 1;
    } while(0); if(!is_done) {
        perror(":(");
    } {
        free(a);
        free(b);
    }
    return is_done ? EXIT_SUCCESS : EXIT_FAILURE;
}

打印，

<Foo> <To be or >

兼容的解决方案使用 C99 灵活的数组成员。该页面还说，

Failing to use the correct syntax when declaring a flexible array member can result in undefined behavior, although the incorrect syntax will work on most implementations.

从技术上讲，这段 C90 代码是否也会产生未定义的行为？如果不是，有什么区别？（或者 Carnegie Mellon Wiki 不正确？）这将无法实现的实施因素是什么？

Answer 1

这应该是明确定义的：

a->array = (char *)(a + 1);

因为您创建了一个指向大小为 1 的数组末尾后一个元素的指针，但没有取消引用它。因为 a->array 现在指向还没有有效类型的字节，所以您可以安全地使用它们。

然而，这只有效，因为您将后面的字节用作 char 的数组。如果您改为尝试创建大小大于 1 的其他类型的数组，则可能会出现对齐问题。

例如，如果您使用 32 位指针为 ARM 编译了一个程序，并且您有：

struct Array {
    int size;
    uint64_t *a;
};
...
Array a = malloc(sizeof *a + (length * sizeof(uint64_t)));
a->length = length;
a->a= (uint64_t *)(a + 1);       // misaligned pointer
a->a[0] = 0x1111222233334444ULL;  // misaligned write

您的程序会因写入未对齐而崩溃。所以一般来说你不应该依赖这个。最好坚持使用标准保证会起作用的灵活数组成员。

Answer 2

作为好答案的补充，解决对齐问题的一种方法是使用 union。这确保 &p[1] 与 (uint64_t*)¹ 正确对齐。 sizeof *p 包括任何需要的填充与 sizeof *a。

  union {
    struct Array header;
    uint64_t dummy;
  } *p;
  p = malloc(sizeof *p + length*sizeof p->header->array);

  struct Array *a = (struct Array *)&p[0]; // or = &(p->header);
  a->length = length;
  a->array = (uint64_t*) &p[1]; // or &p[1].dummy;

或者使用 C99 和灵活的数组成员。

¹ 以及 struct Array

Answer 3

在 C89 发布之前，有一些实现会尝试识别和捕获越界数组访问。给出类似的东西：

struct foo {int a[4],b[4];} *p;

如果 i 不在 0 到 3 范围内，这样的实现会在尝试访问 p->a[i] 时发出尖叫声。对于不需要索引数组类型地址的程序左值 p->a 访问该数组之外的任何内容，能够捕获此类越界访问将很有用。

C89 的作者几乎肯定也意识到，程序通常使用结构末尾的虚拟大小数组的地址作为访问结构之外的存储的一种方式。使用这些技术可以做一些用其他方法做不到的事情，根据标准的作者，C 精神的一部分是 "Don't prevent the programmer from doing what needs to be done".

因此，标准的作者将此类访问视为实现可以随意支持或不支持的访问，大概基于对他们的客户最有用的访问。虽然通常对数组中的结构进行边界检查访问的实现通常会有所帮助，但在间接访问结构的最后一项是具有一个元素的数组（或者，如果他们扩展语言以放弃编译时约束，零元素），编写此类实现的人大概能够识别这些东西，而无需标准的作者告诉他们。 "Undefined Behavior" 旨在作为某种形式的禁令的概念似乎在 C89 的后续标准发布之后才真正站稳脚跟。

关于您的示例，在结构中有一个指向同一分配中稍后存储的指针应该可行，但有几个注意事项：

如果分配给realloc，里面的指针就会失效
与灵活的数组成员相比，使用指针的唯一真正优势是它允许将指针指向其他地方。如果 "something else" 的唯一种类永远是一个永远不必释放的静态持续时间的常量对象，或者如果它是某种不需要释放的其他类型的对象，那可能会很好，但是如果它可以保存对存储在单独分配中的内容的唯一引用，则可能会有问题。

灵活数组成员在C89编写之前就作为扩展在一些编译器中可用，并在C99中正式添加。任何体面的编译器都应该支持它们。

Answer 4

您可以将结构 Array 定义为：

struct Array
{
    size_t length;
    char array[1];
}; /* +(length + 1) char */

然后 malloc( sizeof *a + length )。 “+1”元素在 array[1] 成员中。填充结构：

a->length = length;
strcpy( a->array, str );

单个 malloc 中的多个结构调用未定义的行为？

Multiple structures in a single malloc invoking undefined behaviour?

c

c89

language-lawyer

flexible-array-member