FBString 的小字符串优化是否依赖未定义行为?

Does FBString's small string optimization rely on undefined behavior?

Facebook 的 fbstring_core class uses the "Small String Optimization" described in this talk wherein the storage for the class' data members -- a Char*, size and capacity -- will be repurposed to store character data if the string is sufficiently small. The flag bits used to distinguish between these cases are located in the "rightmost char of the storage". My question is whether accessing these bits through the bytes_ union member, which is never actually written, constitutes undefined behavior per the C++11 standard? The answer to Accessing inactive union member and undefined behavior? 表明是。

以下摘录包含这些成员的声明以及用于确定此优化是否生效的 category() 成员函数。

    typedef uint8_t category_type;

    enum class Category : category_type {
      isSmall = 0,
      isMedium = kIsLittleEndian ? 0x80 : 0x2,
      isLarge = kIsLittleEndian ? 0x40 : 0x1,
    };

    Category category() const {
      // works for both big-endian and little-endian
      return static_cast<Category>(bytes_[lastChar] & categoryExtractMask);
    }

    struct MediumLarge {
      Char * data_;
      size_t size_;
      size_t capacity_;

      size_t capacity() const {
        return kIsLittleEndian
          ? capacity_ & capacityExtractMask
          : capacity_ >> 2;
      }

      void setCapacity(size_t cap, Category cat) {
        capacity_ = kIsLittleEndian
            ? cap | (static_cast<size_t>(cat) << kCategoryShift)
            : (cap << 2) | static_cast<size_t>(cat);
      }
    };

    union {
      uint8_t bytes_[sizeof(MediumLarge)]; // For accessing the last byte.
      Char small_[sizeof(MediumLarge) / sizeof(Char)];
      MediumLarge ml_;
    };

似乎此实现依赖于使用 "type punning" 来访问可能实际上属于 size_t capacity_ 成员的字节。从上面链接的问题的答案中,我了解到这个在C99中定义的行为,但在C++11中不是

这不仅看起来是UB,而且完全没有必要,因为bytes_的唯一用途似乎是读取this的最后一个字节,这可以在没有UB的情况下完成:

reinterpret_cast<const char*>(this)[sizeof(*this) - 1]

这要归功于 C++ 中的特殊豁免,它允许将对象重新解释为 char 数组。