我可以使用 memcpy 写入多个相邻的标准布局子对象吗?

Can I use memcpy to write to multiple adjacent Standard Layout sub-objects?

免责声明:这是试图深入研究一个更大的问题,所以请不要纠结于这个例子在实践中是否有意义。

并且,是的,如果您想要复制 对象,请使用/提供复制构造函数。 (但请注意,即使是示例也不会复制整个对象;它会尝试在几个相邻的 (Q.2) 整数上 blit 一些内存。)


给定一个 C++ Standard Layout struct,我可以使用 memcpy 一次写入多个(相邻的)子对象吗?

完整示例:(https://ideone.com/1lP2Gd https://ideone.com/YXspBk)

#include <vector>
#include <iostream>
#include <assert.h>
#include <inttypes.h>
#include <stddef.h>
#include <memory.h>

struct MyStandardLayout {
    char mem_a;
    int16_t num_1;
    int32_t num_2;
    int64_t num_3;
    char mem_z;

    MyStandardLayout()
    : mem_a('a')
    , num_1(1 + (1 << 14))
    , num_2(1 + (1 << 30))
    , num_3(1LL + (1LL << 62))
    , mem_z('z')
    { }

    void print() const {
        std::cout << 
            "MySL Obj: " <<
            mem_a << " / " <<
            num_1 << " / " <<
            num_2 << " / " <<
            num_3 << " / " <<
            mem_z << "\n";
    }
};

void ZeroInts(MyStandardLayout* pObj) {
    const size_t first = offsetof(MyStandardLayout, num_1);
    const size_t third = offsetof(MyStandardLayout, num_3);
    std::cout << "ofs(1st) =  " << first << "\n";
    std::cout << "ofs(3rd) =  " << third << "\n";
    assert(third > first);
    const size_t delta = third - first;
    std::cout << "delta =  " << delta << "\n";
    const size_t sizeAll = delta + sizeof(MyStandardLayout::num_3);
    std::cout << "sizeAll =  " << sizeAll << "\n";

    std::vector<char> buf( sizeAll, 0 );
    memcpy(&pObj->num_1, &buf[0], sizeAll);
}

int main()
{
    MyStandardLayout obj;
    obj.print();
    ZeroInts(&obj);
    obj.print();

    return 0;
}

鉴于 C++ Standard 中的措辞:

9.2 Class Members

...

13 Nonstatic data members of a (non-union) class with the same access control (Clause 11) are allocated so that later members have higher addresses within a class object. (...) Implementation alignment requirements might cause two adjacent members not to be allocated immediately after each other; (...)

我会得出结论,保证 num_1num_3 具有增加的地址并且是相邻的模填充。

为了完全定义上面的例子,我看到了这些要求,但我不确定它们是否成立:

这些属性成立吗?我还错过了什么吗?

§8.5

(6.2) — if T is a (possibly cv-qualified) non-union class type, each non-static data member and each base-class subobject is zero-initialized and padding is initialized to zero bits;

现在标准实际上并没有说这些零位是可写的,但我想不出一种架构在内存访问权限上具有这种粒度级别(我们也不希望这样)。

所以我会说在实践中这种重写零将永远是安全的,即使当权者没有特别声明。

is legal to reinterpret the memory range of the three members as a "conceptual"(?) char array

不,对象成员的任意子集本身不是任何类型的对象。如果你不能拿 sizeof 东西,那不是东西。同样,正如您提供的 link 所建议的,如果您无法将事物识别为 std::is_standard_layout,则它不是事物。

类似的是

size_t n = (char*)&num_3 - (char*)&num_1;

它会编译,但它是 UB:相减的指针必须属于同一个对象。

也就是说,即使标准不明确,我也认为您处于安全区域。如果 MyStandardLayout 是标准布局,那么它的子集也是标准布局,即使它没有名称并且不是自己的可识别类型也是如此。

但我不会这样做。赋值是绝对安全的,并且可能比 memcpy 更快。如果子集是有意义的并且有很多成员,我会考虑使它成为一个显式结构,并使用赋值而不是 memcpy,利用编译器提供的默认成员智能复制构造函数。

将此作为部分答案。 memcpy(&num_1, buf, sizeAll):

注意: 更加简洁明确。

我问:

  • memcpy must be allowed to write to multiple "memory objects" in this way at once, i.e. specifically

    • Calling memcpy with the target address of num_1 and a size that is larger than the size of the num_1 "object" is legal.
    • The [C++ (14) Standard][2], AFAICT, refers description of memcpy to the [C99 Standard][3], and that one states:

    7.21.2.1 The memcpy function

    2 The memcpy function copies n characters from the object pointed to by s2 into the object pointed to by s1.

    所以对我来说,这里的问题是。这是我们这里的目标范围是否可以根据C或C++考虑"an object" 标准.

进一步思考和搜索,我在 C 标准中发现:

§ 6.2.6 Representations of types

§ 6.2.6.1 General

2 Except for bit-fields, objects are composed of contiguous sequences of one or more bytes, the number, order, and encoding of which are either explicitly specified or implementation-defined.

所以至少暗示 "an object" => "contiguous sequence of bytes".

我不是那么大胆地​​声称相反的 - "contiguous sequence of bytes" => "an object" - 成立,但至少 "an object" 似乎没有更多定义严格在这里。

然后,正如 Q 中引用的那样,C++ 标准的 §9.2/13(和 §1.8/5)似乎保证我们 do 有一个连续的字节序列(包括填充)。

然后,§3.9/3 说:

3 For any trivially copyable type T, if two pointers to T point to distinct T objects obj1 and obj2, where neither obj1 nor obj2 is a base-class subobject, if the underlying bytes (1.7) making up obj1 are copied into obj2, obj2 shall subsequently hold the same value as obj1. [ Example:

T* t1p;
T* t2p;       
     // provided that t2p points to an initialized object ...         
std::memcpy(t1p, t2p, sizeof(T));  
     // at this point, every subobject of trivially copyable type in *t1p contains        
     // the same value as the corresponding subobject in *t2p

—end example ]

因此,这明确允许将 memcpy 应用到 整个 类型的 Trivially Copyable 对象。

在示例中,三个成员组成一个 "trivially copyable sub-object",实际上我认为将它们包装在不同类型的实际子对象中仍然会要求显式对象与三个成员完全相同的内存布局:

struct MyStandardLayout_Flat {
    char mem_a;
    int16_t num_1;
    int32_t num_2;
    int64_t num_3;
    char mem_z;
};

struct MyStandardLayout_Sub {
    int16_t num_1;
    int32_t num_2;
    int64_t num_3;
};

struct MyStandardLayout_Composite {
    char mem_a;
    // Note that the padding here is different from the padding in MyStandardLayout_Flat, but that doesn't change how num_* are layed out.
    MyStandardLayout_Sub nums;
    char mem_z;
};

_Compositenums的内存布局和_Flat的三个成员的内存布局应该完全相同,因为适用相同的基本规则。

所以结论,假设"sub object" num_1到num_3将由等效的连续字节序列表示为完整的平凡可复制子对象,我:

  • 非常非常很难想象一个实现或优化器会打破这个
  • 会说它可以是:
    • 读作未定义行为,iff​​我们得出结论,C++§3.9/3暗示(完整)平凡可复制类型的对象允许被 memcpy 如此处理,或者从 C99§6.2.6.1/2 和 memcpy 7.21.2.1 的一般规范得出结论,连续序列num_* 个字节不包含用于 memcopy 的 "valid object"。
    • 读作定义的行为,iff​​我们得出结论,C++§3.9/3没有规范地限制[=13=的适用性] 到其他类型或内存范围,并得出结论,C99 标准中 memcpy(和 "object term")的定义允许将相邻变量视为单个对象连续字节目标。