为什么在堆数组初始化中调用了两次复制构造函数？

Question

对于以下 C++14 代码，为什么 g++ 为 new A[1]{x} 生成的代码似乎调用了两次复制构造函数？

#include <iostream>
using namespace std;

class A {
public:
    A()           { cout << "default ctor" << endl; }
    A(const A& o) { cout << "copy ctor" << endl;    }
    ~A()          { cout << "dtor" << endl;         }
};

int main()
{
    A x;
    cout << "=========" << endl;
    A* y = new A[1]{x};
    cout << "=========" << endl;
    delete[] y;
    return 0;
}

编译输出：

$ g++ -fno-elide-constructors -std=c++14 test.cpp && ./a.out
default ctor
=========
copy ctor
copy ctor
dtor
=========
dtor
dtor

有趣的是，同样的代码，clang++只调用了一次拷贝构造函数：

$ clang++ -fno-elide-constructors -std=c++14 test.cpp && ./a.out
default ctor
=========
copy ctor
=========
dtor
dtor

此外，当使用g++时，将A* y = new A[1]{x};行更改为以下任何一项都会导致复制构造函数只被调用一次：

A* y = new A {x}; - 普通堆对象而不是大小为 1
A y[1] {x}; - 堆栈上的数组而不是堆上的数组

所以双拷贝构造函数的行为似乎只出现在堆数组初始化中。

Answer 1

在对标准进行了一些研究之后，我得出的结论是 g++ 是错误的，应该只有一个复制构造函数调用。有趣的是，对于此处发生的初始化类型似乎可以有两种解释。两者都得出相同的结论。

第一种解释——直接初始化

来自 C++14 标准 (Working Draft)，[expr.new] 17:

A new-expression that creates an object of type T initializes that object as follows:

(17.1) — If the new-initializer is omitted, the object is default-initialized (8.5). [ Note: If no initialization is performed, the object has an indeterminate value. — end note ]

(17.2) — Otherwise, the new-initializer is interpreted according to the initialization rules of 8.5 for direct initialization.

在我们的例子中，存在 new-initializer，因此（根据 17.2）new A[1]{x} 使用直接初始化规则进行解释。让我们看看 [[=90=]] 16:

The initialization that occurs in the forms

T x(a);

T x{a};

as well as in new expressions (5.3.4), static_cast expressions (5.2.9), functional notation type conversions (5.2.3), mem-initializers (12.6.2), and the braced-init-list form of a condition is called direct-initialization

好的，这进一步证实了我们处理的是直接初始化。现在让我们看看直接初始化在 [dcl.init] 17:

中是如何工作的

The semantics of initializers are as follows. The destination type is the type of the object or reference being initialized and the source type is the type of the initializer expression. If the initializer is not a single (possibly parenthesized) expression, the source type is not defined.

[... 17.1 through 17.5 omitted ...]

(17.6) — If the destination type is a (possibly cv-qualified) class type:

(17.6.1) — If the initialization is direct-initialization, or if it is copy-initialization where the cv-unqualified version of the source type is the same class as, or a derived class of, the class of the destination, constructors are considered. The applicable constructors are enumerated (13.3.1.3), and the best one is chosen through overload resolution (13.3). The constructor so selected is called to initialize the object, with the initializer expression or expression-list as its argument(s). If no constructor applies, or the overload resolution is ambiguous, the initialization is ill-formed.

根据上面的摘录，当被初始化的对象是 class 类型时（就像这里的情况）并且在处理直接初始化（就像这里的情况）时，目标对象被初始化使用最合适的构造函数。

关于构造函数如何选择的规则我就不举例了，在这种情况下只有默认A::A()构造函数和复制A::A(const A&)构造函数时，复制构造函数显然是使用 A 类型的 x 进行初始化时的更好选择。这是复制构造函数调用之一的来源。

我没有找到任何关于数组初始化的评论，特别是在 [expr.new] 部分以及为什么它会导致第二次构造函数调用。

第二种解释-复制初始化

这里，我们可以从[dcl.init.list] 1:

开始

List-initialization is initialization of an object or reference from a braced-init-list. Such an initializer is called an initializer list, and the comma-separated initializer-clauses of the list are called the elements of the initializer list. An initializer list may be empty. List-initialization can occur in direct-initialization or copy initialization contexts; list-initialization in a direct-initialization context is called direct-list-initialization and list-initialization in a copy-initialization context is called copy-list-initialization. [ Note: List-initialization can be used

(1.1) — as the initializer in a variable definition (8.5)

(1.2) — as the initializer in a new-expression (5.3.4)

[... 1.3 through 1.10 omitted ...]

— end note ]

这段摘录可以理解为new A[1]{x}实际上是一种列表初始化的形式，而不是直接初始化为braced-init-list {x} 用来。假设是这种情况，让我们看看它在 [dcl.init.list] 3:

中是如何工作的

List-initialization of an object or reference of type T is defined as follows:

[... 3.1 through 3.2 omitted ...]

(3.3) — Otherwise, if T is an aggregate, aggregate initialization is performed (8.5.1).

[... 3.4 through 3.10 omitted ...]

在我们的例子中，第 3.3 点适用，因为我们正在初始化一个聚合数组，根据 [dcl.init.aggr] 1:

An aggregate is an array or a class (Clause 9) with no user-provided constructors (12.1), no private or protected non-static data members (Clause 11), no base classes (Clause 10), and no virtual functions (10.3).

因此，让我们看看如何在 [dcl.init.aggr] 2:

中执行聚合初始化

When an aggregate is initialized by an initializer list, as specified in 8.5.4, the elements of the initializer list are taken as initializers for the members of the aggregate, in increasing subscript or member order. Each member is copy-initialized from the corresponding initializer-clause. If the initializer-clause is an expression and a narrowing conversion (8.5.4) is required to convert the expression, the program is ill-formed.

这个片段告诉我们元素是复制初始化的。因此 y[0] 将从 x 复制初始化。现在让我们看看 [[=90=]]17:

中的复制初始化是如何工作的

The semantics of initializers are as follows. The destination type is the type of the object or reference being initialized and the source type is the type of the initializer expression. If the initializer is not a single (possibly parenthesized) expression, the source type is not defined.

[... 17.1 through 17.5 omitted ...]

(17.6) — If the destination type is a (possibly cv-qualified) class type:

(17.6.1) — If the initialization is direct-initialization, or if it is copy-initialization where the cv-unqualified version of the source type is the same class as, or a derived class of, the class of the destination, constructors are considered. The applicable constructors are enumerated (13.3.1.3), and the best one is chosen through overload resolution (13.3). The constructor so selected is called to initialize the object, with the initializer expression or expression-list as its argument(s). If no constructor applies, or the overload resolution is ambiguous, the initialization is ill-formed.

与上次一样，此初始化满足第 17.6.1 点的要求，因为它是复制初始化，其中源类型（x 的A）与目标类型相同（A，共 y[0]）。这意味着在这种情况下复制构造函数也将被调用。

结论

看来无论选择哪种解释，都应该只调用一个构造函数，Clang是对的。我找不到任何证据表明应该创建一个临时文件。对于更多基于示例的证据，其他编译器如 icc 和（公认的基于 clang 的）zapcc 和 elcc agree with clang，都只有一个复制构造函数调用。

我不太了解 g++ 的内部工作原理，但我有一个关于为什么它执行两次复制构造函数调用的理论。有可能在内部 g++ 使用一些辅助构造函数调用，这些调用后来总是被优化掉，并且 -fno-elide-constructors 标志的使用打破了它们总是被优化掉的不变性。然而，这纯粹是我对 g++ 的猜测，所以如果我错了，请纠正我。

Answer 2

TL;DR: 这可能是 GCC 缺陷，在这种情况下将 {x} 误解为暂时的。对于 new A[N]{x1, x2, ... xN} 中的每个元素，复制构造函数应该根据 [decl.init] 和 [new.expr] 调用一次。相反，GCC likely 将其解释为初始化列表，因此部分解释为中间右值。不过，我们可以强制 GCC 以其他方式解释它。

why does g++'s generated code for new A[1]{x} seem to invoke the copy constructor twice?

因为没有移动构造函数。如果我们添加移动构造函数和更多输出，我们可以更好地了解情况 (Compiler Explorer):

#include <iostream>
using namespace std;

class A {
public:
    A()           { cout << "default ctor @" << this << endl; }
    A(A&& o)      { cout << "move ctor: " << &o << " to " << this << endl;    }
    A(const A& o) { cout << "copy ctor: " << &o << " to " << this << endl;    }
    ~A()          { cout << "dtor @" << this << endl;         }
};

int main()
{
    A x;
    cout << "=========" << endl;
    A* y = new A[1]{x};
    cout << "=========" << endl;
    delete[] y;
    return 0;
}

请注意，我们新的 A(A&&) 构造函数的存在向我们展示了中间的临时值：

default ctor @0x7ffec28b5476
=========
copy ctor: 0x7ffec28b5476 to 0x7ffec28b5477
move ctor: 0x7ffec28b5477 to 0x55d0a7fa6288
dtor @0x7ffec28b5477
=========
dtor @0x55d0a7fa6288
dtor @0x7ffec28b5476

确实，如果我们 A(A&&) = delete 构造函数，g++ 甚至不会再编译它（但 Clang 仍然接受它）。

g++ 似乎误解了 braced-init-list。恕我直言，[expr.new] 可能允许这种解释，但这似乎是一个 g++ 缺陷，应该得到这样的报告。

然而，整个考验让我想起了我的一个较早的问题 (Are curly braces really required around initialization?)。因此，让我们引入更多大括号以确保 g++ 不会误解我们的初始化程序：

int main()
{
    A x;
    cout << "=========" << endl;
    A* y = new A[1]{{{x}}};
    cout << "=========" << endl;
    delete[] y;
    return 0;
}

这个变体规避了 g++ 的行为：

initializer for T[1]     start : {
initializer for first element  : {
actual initializer for A       : {x}

然后程序输出为(Explorer)

default ctor @0x7ffede3d9967
=========
copy ctor: 0x7ffede3d9967 to 0x1eb0ec8
=========
dtor @0x1eb0ec8
dtor @0x7ffede3d9967

所以对于多个元素，我们最终陷入大括号地狱 (Compiler Explorer):

int main()
{
    A x;
    cout << "=========" << endl;
    A* y = new A[2]{{{x},{{x}}};
    cout << "=========" << endl;
    delete[] y;
    return 0;
}

同样，没有调用额外的构造函数：

default ctor @0x7fff3a2a7a27
=========
copy ctor: 0x7fff3a2a7a27 to 0x1f49ec8
copy ctor: 0x7fff3a2a7a27 to 0x1f49ec9
=========
dtor @0x1f49ec9
dtor @0x1f49ec8
dtor @0x7fff3a2a7a27

为什么在堆数组初始化中调用了两次复制构造函数？

Why copy constructor called twice in heap array initialization?

c++

g++

copy-constructor

clang++

c++14

第一种解释——直接初始化

第二种解释-复制初始化

结论