memcpy derived class to base class, 为什么仍然调用 base class 函数

Question

我正在阅读 Inside the C++ Object Model。在第 1.3 节中

So, then, why is it that, given

Bear b; 
ZooAnimal za = b; 

// ZooAnimal::rotate() invoked 
za.rotate();

the instance of rotate() invoked is the ZooAnimal instance and not that of Bear? Moreover, if memberwise initialization copies the values of one object to another, why is za's vptr not addressing Bear's virtual table?

The answer to the second question is that the compiler intercedes in the initialization and assignment of one class object with another. The compiler must ensure that if an object contains one or more vptrs, those vptr values are not initialized or changed by the source object .

所以我写了下面的测试代码：

#include <stdio.h>
class Base{
public:
    virtual void vfunc() { puts("Base::vfunc()"); }
};
class Derived: public Base
{
public:
    virtual void vfunc() { puts("Derived::vfunc()"); }
};
#include <string.h>

int main()
{
    Derived d;
    Base b_assign = d;
    Base b_memcpy;
    memcpy(&b_memcpy, &d, sizeof(Base));

    b_assign.vfunc();
    b_memcpy.vfunc();

    printf("sizeof Base : %d\n", sizeof(Base));

    Base &b_ref = d;
    b_ref.vfunc();

    printf("b_assign: %x; b_memcpy: %x; b_ref: %x\n", 
        *(int *)&b_assign,
        *(int *)&b_memcpy,
        *(int *)&b_ref);
    return 0;
}

result

Base::vfunc()
Base::vfunc()
sizeof Base : 4
Derived::vfunc()
b_assign: 80487b4; b_memcpy: 8048780; b_ref: 8048780

我的问题是为什么 b_memcpy 仍然调用 Base::vfunc()

Answer 1

你所做的在 C++ 语言中是非法的，这意味着你的 b_memcpy 对象的行为是未定义的。后者意味着任何行为都是 "correct" 并且您的期望是完全没有根据的。尝试分析未定义的行为没有多大意义 - 它不应该遵循任何逻辑。

实际上，您对 memcpy 的操作很可能确实将 Derived 的虚拟 table 指针复制到 b_memcpy 对象。您对 b_ref 的实验证实了这一点。但是，当通过直接对象调用虚拟方法时（如 b_memcpy.vfunc() 调用的情况），大多数实现优化了对虚拟 table 的访问并执行 direct (非虚拟) 调用目标函数。该语言的正式规则规定，任何法律行动都不能使 b_memcpy.vfunc() 调用分派到 Base::vfunc() 以外的任何地方，这就是为什么编译器可以安全地将此调用替换为直接调用 Base::vfunc().这就是为什么任何虚拟 table 操作通常对 b_memcpy.vfunc() 调用没有影响。

Answer 2

不允许您这样做

memcpy(&b_memcpy, &d, sizeof(Base));

- 这是未定义的行为，因为 b_memcpy 和 d 不是 "plain old data" 对象（因为它们具有虚拟成员函数）。

如果您写道：

b_memcpy = d;

然后它将按预期打印 Base::vfunc()。

Answer 3

您调用的行为是未定义的，因为标准说它是未定义的，而您的编译器利用了这一事实。让我们看看 g++ 的具体示例。它为禁用优化的 b_memcpy.vfunc(); 行生成的程序集如下所示：

lea     rax, [rbp-48]
mov     rdi, rax
call    Base::vfunc()

如您所见，甚至没有引用 vtable。由于编译器知道 b_memcpy 的静态类型，因此没有理由以多态方式分派该方法调用。 b_memcpy 只能是 Base 对象，因此它只生成对 Base::vfunc() 的调用，就像对任何其他方法调用一样。

更进一步，让我们添加一个这样的函数：

void callVfunc(Base& b)
{
  b.vfunc();
}

现在，如果我们调用 callVfunc(b_memcpy);，我们会看到不同的结果。在这里，根据我编译代码的优化级别，我们得到不同的结果。在 -O0 和 -O1 上调用 Derived::vfunc()，在 -O2 和 -O3 上打印 Base::vfunc()。同样，由于标准表示您的程序的行为是未定义的，因此编译器不会努力产生可预测的结果，而只是依赖于语言所做的假设。由于编译器知道 b_memcpy 是一个 Base 对象，它可以在优化级别允许时简单地内联对 puts("Base::vfunc()"); 的调用。

Answer 4

对 vptr 的任何使用都超出了标准的范围

当然，这里使用`memcpy`有UB

答案指出 memcpy 的任何使用，或 non-PODs 的其他字节操作，即任何具有 vptr 的对象，都具有未定义的行为，在技术上严格来说是正确的，但不回答问题。 问题是基于 vptr（vtable 指针）的存在，它甚至没有被标准强制要求：当然，答案将涉及标准之外的事实，结果不会是有标准保证！

标准文本与 vptr 无关

问题不在于你不允许操纵vptr；标准允许操纵标准文本中甚至没有描述的任何东西的想法是荒谬的。当然，不存在更改 vptr 的标准方法，这不是重点。

vptr 编码多态对象的类型

这里的问题不是标准对 vptr 的描述，问题是 vptr 代表什么，以及标准对 vptr 的描述：vptr 代表动态类型一个对象。每当操作的结果取决于动态类型时，编译器将生成使用 vptr 的代码。

[关于MI的注意事项：我说的是"the"vptr（好像只有一个vptr），但是当涉及到MI（多重继承）时，对象可以有多个vptr，每个vptr代表一个完整的对象被视为特定的多态基础 class 类型。（多态 class 是具有至少一个虚函数的 class。）]

[关于虚基的注意事项：我只提到了 vptr，但一些编译器插入其他指针来表示动态类型的各个方面，如虚基子对象的位置，而其他一些编译器为此目的使用 vptr。 vptr 的正确之处也适用于这些其他内部指针。]

所以vptr的一个特定值对应一个动态类型：那是大多数派生对象的类型。

对象在其生命周期中动态类型的变化

在构造过程中，动态类型发生变化，这就是为什么从构造函数内部调用虚函数可以是"surprising"。有人说构造时调用虚函数的规则特殊，但绝对不是：调用final overrider；该覆盖是 class 对应于已构造的最派生对象的那个，并且在构造函数 C::C(arg-list) 中，它始终是 class C 的类型。

在销毁期间，动态类型以相反的顺序更改。从内部析构函数调用虚函数遵循相同的规则。

未定义的内容是什么意思

您可以进行标准中未认可的低级操作。 C++ 标准中未明确定义的行为并不意味着其他地方未对其进行描述。仅仅因为操作的结果在 C++ 标准中被明确描述为具有 UB（未定义行为）并不意味着您的实现无法定义它。

您还可以利用您对编译器工作方式的了解：如果使用严格的单独编译，即当编译器无法从单独编译的代码中获取任何信息时，每个单独编译的函数都是一个 "black box" .您可以使用这个事实：编译器将不得不假设单独编译的函数可以做的任何事情都会完成。即使在给定的函数内部，您也可以使用 asm 指令来获得相同的效果：一个没有约束的 asm 指令可以做任何在 C++ 中合法的事情。效果是 "forget what you know from code analysis at that point" 指令。

该标准描述了可以更改动态类型的内容，除了 construction/destruction 之外不允许更改它，因此只允许 "external"（黑盒）函数执行 construction/destruction 可以改变动态类型。

不允许在现有对象上调用构造函数，除非使用完全相同的类型（并有限制）重建它，参见 [basic.life]/8 :

If, after the lifetime of an object has ended and before the storage which the object occupied is reused or released, a new object is created at the storage location which the original object occupied, a pointer that pointed to the original object, a reference that referred to the original object, or the name of the original object will automatically refer to the new object and, once the lifetime of the new object has started, can be used to manipulate the new object, if:

(8.1) the storage for the new object exactly overlays the storage location which the original object occupied, and

(8.2) the new object is of the same type as the original object (ignoring the top-level cv-qualifiers), and

(8.3) the type of the original object is not const-qualified, and, if a class type, does not contain any non-static data member whose type is const-qualified or a reference type, and

(8.4) the original object was a most derived object ([intro.object]) of type T and the new object is a most derived object of type T (that is, they are not base class subobjects).

这意味着您可以调用构造函数（使用 new 放置）并仍然使用用于指定对象的相同表达式（它的名称、指向它的指针等）的唯一情况是动态type 不会改变，所以 vptr 仍然是一样的。

换句话说，如果您想使用低级技巧覆盖 vptr，您可以；但前提是你写了相同的值.

换句话说，不要试图破解 vptr。

memcpy derived class to base class, 为什么仍然调用 base class 函数

memcpy derived class to base class, why still called base class function

c++

lifetime

undefined-behavior

vptr

dynamictype

对 vptr 的任何使用都超出了标准的范围

当然，这里使用`memcpy`有UB

标准文本与 vptr 无关

vptr 编码多态对象的类型

对象在其生命周期中动态类型的变化

未定义的内容是什么意思

memcpy derived class to base class, 为什么仍然调用 base class 函数

memcpy derived class to base class, why still called base class function

c++

lifetime

undefined-behavior

vptr

dynamictype

对 vptr 的任何使用都超出了标准的范围

当然，这里使用memcpy有UB

标准文本与 vptr 无关

vptr 编码多态对象的类型

对象在其生命周期中动态类型的变化

未定义的内容是什么意思

当然，这里使用`memcpy`有UB