灵活数组成员上的结构

Question

我正在编写一个 C 程序（g++ 可编译），它必须处理许多不同的结构，所有结构都来自具有预定义格式的缓冲区。格式指定我应该加载哪种类型的结构。这可以使用联合来解决，但是结构大小的巨大差异让我决定使用其中包含 void * 的结构：

struct msg {
    int type;
    void * data; /* may be any of the 50 defined structures: @see type */
};

问题是我需要 2 个 malloc 调用和 2 个 free。对我来说，函数调用很昂贵， malloc 很昂贵。从用户的角度来看，简化 free 消息会很棒。所以我将定义更改为：

struct msg {
    int type;
    uint8_t data[]; /* flexible array member */
};
...
struct msg_10 {
    uint32_t flags[4];
    ...
};

每当我需要反序列化消息时，我都会：

struct msg * deserialize_10(uint8_t * buffer, size_t length) {
    struct msg * msg = (struct msg *) malloc(offsetof(struct msg, data) + sizeof(struct msg_10));
    struct msg_10 * payload = (__typeof__(payload))msg->data;

    /* load payload */
    return msg;
}

并获取该结构的成员：

uint32_t msg10_flags(const struct msg * msg, int i)
{
    return ((struct msg_10 *)(msg->data))->flags[i];
}

通过此更改，gcc（和 g++）会发出一条很好的 warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing] 消息。

我认为这是一个关于如何以某种有效方式在 C 中表示消息族的常见问题（但我在这里没有找到答案）。

我明白为什么会出现警告，我的问题如下：

是否有可能在没有警告的情况下实施这样的事情，或者它是否存在内在缺陷？（or 不是唯一的：P，我几乎确信我应该重构）

使用类似下面的代码来表示消息会更好吗？

struct msg {
    int type;
};
...
struct msg_10 {
    struct msg; /* or int type; */
    uint32_t flags[4];
    ...
};

如果是，请注意？我可以随时编写和使用以下内容吗？

struct msg * deserialize_10(uint8_t * buffer, size_t length) {
    struct msg_10 * msg = (struct msg_10 *) malloc(sizeof(struct msg_10));

    /* load payload */
    return (struct msg *)msg;
}

uint32_t msg10_flags(const struct msg * msg, int i) {
    const struct msg_10 * msg10 = (const struct msg_10 *) msg;
    return msg10->flags[i];
}

还有其他吗？

我忘了说它在低级别系统上运行，性能是首要任务，但总而言之，真正的问题是如何处理这个 "multi-message" 结构。我可能会重构一次，但是改变50种消息类型反序列化的实现...

Answer 1

I'm writing a C program (g++ compilable)

这是误会。

C 源文件应由 gcc 编译（而非由 g++ 编译）。 C++ 源文件应该由 g++ 编译（而不是 gcc）。记住GCC means Gnu Compiler Collection (and also contains gfortran and gccgo etc... when suitably configured). So Fortran source files should be compiled with gfortran (if using GCC), Go source files should be compiled with gccgo (if using GCC), Ada code should be compiled with gnat (if using GCC)，等等。

了解 Invoking GCC。通过将 -v 也传递给您的 gcc 或 g++ 编译器命令来检查会发生什么（它应该调用 cc1 编译器，而不是 cc1plus 编译器）。

如果您坚持使用 g++（而不是 gcc）编译 C99 或 C11 源文件，恕我直言，这是错误且令人困惑的，请确保至少通过 -std=c99 （或 -std=gnu11 等...）和 -x c 标志。

但是您确实应该修复您的 build automation 或构建过程以使用gcc（而不是g++）来编译C 代码。你真正的问题是这一点（Makefile 或其他东西中的一些错误）。

在 link 时，如果混合使用 C 和 C++ 代码，请使用 g++。

请注意 flexible array members 在 C++ 中不存在（并且从未存在过），即使在未来的 C++20 中也是如此。在 C++ 中，您可以使用 0 作为它们声明的大小，例如代码：

#ifdef __cplusplus
#define FLEXIBLE_DIM 0
#else
#define FLEXIBLE_DIM /*empty flexible array member*/
#endif

然后声明：

struct msg {
  int type;
  uint8_t data[FLEXIBLE_DIM]; /* flexible array member */
};

但这只有效，因为 uint8_t 是一个 POD，并且您的 g++ 编译器可能（有时）给出 "buffer overflow" 或 "index out of bounds" 警告（并且你应该从不依赖于那个data字段的编译时间sizeof。

Answer 2

你当然可以使用 makros 构建这样的东西； message_header 作为所有消息类型的父结构。作为此类结构的第一个成员，它们共享相同的地址。因此，在创建 msg(int) 并将其转换为 message_header 之后，您可以简单地通过调用 free 来释放它。（顺便说一句，C++ 的工作方式有些相同）

这是你想要的吗？

struct message_header {
    int type;
};

#define msg(T) struct {struct message_header header; T data} 

struct message_header* init_msg_int(int a) {
    msg(int)* result = (msg(int)*)(malloc(sizeof(msg(int))));
    result->data = a;
    return (struct message_header*)result;
}

int get_msg_int(struct message_header* msg) {
    return ((msg(int)*)msg)->data;
}

void free_msg(struct message_header* msg) {
    free(msg);
}

Answer 3

要避免严格的别名，您可以将结构包装在联合中。使用 C11，您可以使用匿名结构来摆脱访问 "flags":

所需的额外级别

typedef union
{
  struct
  {
    uint32_t flags[4];
  };  
  uint8_t bytes[ sizeof(uint32_t[4]) ];
} msg_10;

现在您可以执行 msg_10* payload = (msg_10*)msg->data; 和访问 payload 而不必担心严格的别名违规，因为联合类型包括与有效类型兼容的类型 (uint8_t[])对象。

但是请注意，malloc 返回的指针在您通过指向特定类型的指针访问它之前没有有效类型。因此，或者，您可以确保在 malloc 之后访问具有正确类型的数据，这也不会造成严格的别名冲突。像

struct msg_10 * msg = malloc(sizeof(struct msg_10));
struct msg_10 dummy = *msg;

这里不会用到dummy，只是设置有效类型而已

Answer 4

没有 mallocs，没有 frees，没有别名，函数调用是内联的，对于简单或非填充的自然对齐结构，内联函数相当于一个小型简单结构的 memcpy 或寄存器副本 。对于更复杂的结构，编译器会完成所有繁重的工作。

由于您正在反序列化，因此缓冲区中 bye 的 alignemnet 可能已打包且未自然对齐。看看 Linux 内核源代码 packed_struct.h (https://elixir.bootlin.com/linux/v3.8/source/include/linux/unaligned/packed_struct.h)

代替 u16、u32、u64，为每个 msg_0..msg_10..msg_(n-1) 推出一个函数。从源文件中可以看出，使用几个简单的宏来简化每个未对齐类型和内联函数的时机已经成熟。使用您的示例名称

struct msg {
    int type;
};
...
struct msg_10 {
    struct msg MsgStruct; /* or int type; */
    uint32_t flags[4];
    ...
};

#define UNALIGNED_STRUCT_NAME(msg_struct_tag) \
    UNA_##msg_struct_tag

#define DECLARE_UNALIGNED_STRUCT(msg_struct_tag) \
  struct UNALIGNED_STRUCT_NAME(msg_struct_tag) \
  {struct msg_struct_tag x;} __attribute__((__packed__))

#define DESERIALIZE_FN_NAME(msg_struct_tag) \
    deserialize_##msg_struct_tag

#define CALL_DESERIALIZE_STRUCT_FN(msg_struct_tag, pbuf) \
    DESERIALIZE_FN_NAME(msg_struct_tag)(pbuf)

#define DEFINE_DESERIALIZE_STRUCT_FN(msg_struct_tag) \
    static inline \
        struct msg_struct_tag DESERIALIZE_FN_NAME(msg_struct_tag)(const void* p) \
    { \
        const struct UNALIGNED_STRUCT_NAME(msg_struct_tag) *ptr = \
            (const struct UNALIGNED_STRUCT_NAME(msg_struct_tag) *)p; \
        return ptr->x; \
    }

...
DECLARE_UNALIGNED_STRUCT(msg_9);
DECLARE_UNALIGNED_STRUCT(msg_10);
DECLARE_UNALIGNED_STRUCT(msg_11);
...
...
DEFINE_DESERIALIZE_STRUCT_FN(msg_9)
DEFINE_DESERIALIZE_STRUCT_FN(msg_10)
DEFINE_DESERIALIZE_STRUCT_FN(msg_11)
...

反序列化消息 10

struct msg_10 ThisMsg = CALL_DESERIALIZE_STRUCT_FN(msg_10, buffer);

或者反序列化缓冲区中字节 9 处的消息 13

struct msg_13 OtherMsg = CALL_DESERIALIZE_STRUCT_FN(msg_13, &(buffer[9]));

灵活数组成员上的结构

Structure over flexible array member

c

c99

strict-aliasing