FlatBuffers C++ reinterpret_cast 访问实际上是未定义的行为吗?这样做实际上可以吗?

Is FlatBuffers C++ reinterpret_cast access actually undefined behavior? Is it practically OK to do that?

最近尝试使用FlatBuffers in C++. I found FlatBuffers seems to use a lot of type punning with things like reinterpret_cast in C++. This make me a little uncomfortable because I've learned it's undefined behavior in many cases

例如Rect 在 fbs 文件中:

struct Rect {
    left:int;
    top:int;
    right:int;
    bottom:int;
}

变成此 C++ 代码以从 table:

中读取它
  const xxxxx::Rect *position() const {
    return GetStruct<const xxxxx::Rect *>(VT_POSITION);
  }

the definition of GetStruct 只是使用 reinterpret_cast.

我的问题是:

  1. 这真的是 C++ 中的未定义行为吗?
  2. 在实践中,这种用法真的会有问题吗?

更新:

缓冲区可以来自网络或磁盘。我不知道如果缓冲区实际上来自同一 C++ 程序的作者编写的同一内存是否不同。

但是作者自动生成的方法是:

  void add_position(const xxxxx::Rect *position) {
    fbb_.AddStruct(Char::VT_POSITION, position);
  }

这将使用 this method and this method,因此也使用 reinterpret_cast。

我认为 Flatbuffers: Use in C++ 页面回答了你的两个问题:

Direct memory access

As you can see from the above examples, all elements in a buffer are accessed through generated accessors. This is because everything is stored in little endian format on all platforms (the accessor performs a swap operation on big endian machines), and also because the layout of things is generally not known to the user.

For structs, layout is deterministic and guaranteed to be the same across platforms (scalars are aligned to their own size, and structs themselves to their largest member), and you are allowed to access this memory directly by using sizeof() and memcpy on the pointer to a struct, or even an array of structs.

这些段落保证——给定一个有效的平面缓冲区——所有内存访问都是有效的,因为该特定位置的内存将匹配预期的布局。

如果您正在处理不受信任的平面缓冲区,您首先需要使用验证函数来确保平面缓冲区有效:

This verifier will check all offsets, all sizes of fields, and null termination of strings to ensure that when a buffer is accessed, all reads will end up inside the buffer.

我没有分析整个 FlatBuffers 的源代码,但我没有看到这些对象是在哪里创建的:我没有看到新的表达式,它会在这里创建 P 个对象:

template<typename P> P GetStruct(voffset_t field) const {
    auto field_offset = GetOptionalFieldOffset(field);
    auto p = const_cast<uint8_t *>(data_ + field_offset);
    return field_offset ? reinterpret_cast<P>(p) : nullptr;
  }

所以,这段代码似乎确实有未定义的行为。

然而,这仅适用于 C++17(或更早版本)。在C++20中,将有implicit-lifetime对象(例如,标量类型,聚合是隐式生命周期类型)。如果 P 具有隐式生命周期,那么这段代码就可以很好地定义。假设相同的内存区域总是被一个类型访问,这不违反类型双关规则(例如,它总是被相同的类型访问)。