push_back() 和 emplace_back() 在幕后

Question

我目前正在自学 C++，我很好奇 push_back() 和 emplace_back() 是如何工作的。我一直认为 emplace_back() 在您尝试构建大型对象并将其推到容器背面时会更快，例如向量。

假设我有一个 Student 对象，我想将其附加到学生矢量的后面。

struct Student {
   string name;
   int student_ID;
   double GPA;
   string favorite_food;
   string favorite_prof;
   int hours_slept;
   int birthyear;
   Student(string name_in, int ID_in, double GPA_in, string food_in, 
           string prof_in, int sleep_in, int birthyear_in) :
           /* initialize member variables */ { }
};

假设我调用 push_back() 并将一个 Student 对象推到向量的末尾：

vector<Student> vec;
vec.push_back(Student("Bob", 123456, 3.89, "pizza", "Smith", 7, 1997));

我这里的理解是push_back在vector外创建了一个Student对象的实例，然后把它移到vector的后面。

图表：

我也可以用 emplace 代替 push:

vector<Student> vec;
vec.emplace_back("Bob", 123456, 3.89, "pizza", "Smith", 7, 1997);

我的理解是 Student 对象是在 vector 的最后面构造的，因此不需要移动。

图表：

因此，放置会更快是有道理的，尤其是在添加许多 Student 对象的情况下。但是，当我对这两个版本的代码进行计时时：

for (int i = 0; i < 10000000; ++i) {
    vec.push_back(Student("Bob", 123456, 3.89, "pizza", "Smith", 7, 1997));
}

和

for (int i = 0; i < 10000000; ++i) {
    vec.emplace_back("Bob", 123456, 3.89, "pizza", "Smith", 7, 1997);
}

我预计后者会更快，因为不必移动大型 Student 对象。奇怪的是，emplace_back 版本最终变慢了（经过多次尝试）。我还尝试插入 10000000 个 Student 对象，其中构造函数接收引用，push_back() 和 emplace_back() 中的参数存储在变量中。这也没有用，因为 emplace 仍然很慢。

我已检查以确保在两种情况下插入的对象数量相同。时间差不是很大，但最终部署速度慢了几秒。

我对 push_back() 和 emplace_back() 工作原理的理解有问题吗？非常感谢您的宝贵时间！

根据要求，这是代码。我正在使用 g++ 编译器。

推回：

struct Student {
   string name;
   int student_ID;
   double GPA;
   string favorite_food;
   string favorite_prof;
   int hours_slept;
   int birthyear;
   Student(string name_in, int ID_in, double GPA_in, string food_in, 
           string prof_in, int sleep_in, int birthyear_in) :
           name(name_in), student_ID(ID_in), GPA(GPA_in), 
           favorite_food(food_in), favorite_prof(prof_in),
           hours_slept(sleep_in), birthyear(birthyear_in) {}
};

int main() {
    vector<Student> vec;
    vec.reserve(10000000);
    for (int i = 0; i < 10000000; ++i) 
         vec.push_back(Student("Bob", 123456, 3.89, "pizza", "Smith", 7, 1997));
    return 0;
}

回位：

struct Student {
   string name;
   int student_ID;
   double GPA;
   string favorite_food;
   string favorite_prof;
   int hours_slept;
   int birthyear;
   Student(string name_in, int ID_in, double GPA_in, string food_in, 
           string prof_in, int sleep_in, int birthyear_in) :
           name(name_in), student_ID(ID_in), GPA(GPA_in), 
           favorite_food(food_in), favorite_prof(prof_in),
           hours_slept(sleep_in), birthyear(birthyear_in) {}
};

int main() {
    vector<Student> vec;
    vec.reserve(10000000);
    for (int i = 0; i < 10000000; ++i) 
         vec.emplace_back("Bob", 123456, 3.89, "pizza", "Smith", 7, 1997);
    return 0;
}

Answer 1

此行为是由于 std::string 的复杂性所致。这里有一些相互作用的东西：

小型字符串优化 (SSO)
在push_back版本中，编译器能够确定compile-time处字符串的长度，而emplace_back版本编译器则不能。因此，emplace_back 调用需要调用 strlen。此外，由于编译器不知道字符串文字的长度，它必须为 SSO 和 non-SSO 两种情况发出代码（参见 Jason Turner 的 "Initializer Lists Are Broken, Let's Fix Them"；这是一个长篇大论，但他遵循将字符串插入到整个向量中的问题）

考虑这个更简单的类型：

struct type {
  std::string a;
  std::string b;
  std::string c;

  type(std::string a, std::string b, std::string c)
    : a{a}
    , b{b}
    , c{c}
  {}
};

请注意构造函数 如何复制 a、b 和 c。

Testing this against a baseline of just allocating memory，我们可以看到 push_back 优于 emplace_back:

^{Click on image for quick-bench link}

因为您的示例中的字符串都适合 SSO 缓冲区，所以在这种情况下复制与移动一样便宜。因此，构造函数非常高效，emplace_back 的改进效果较小。

此外，如果我们在 the assembly 中搜索对 push_back 的调用和对 emplace_back 的调用：

// push_back call
void foo(std::vector<type>& vec) {
    vec.push_back({"Bob", "pizza", "Smith"});
}

// emplace_back call
void foo(std::vector<type>& vec) {
    vec.emplace_back("Bob", "pizza", "Smith");
}

（此处未复制程序集。它很大。std::string 很复杂）

我们可以看到 emplace_back 调用了 strlen，而 push_back 没有。由于字符串文字与正在构造的 std::string 之间的距离增加，编译器无法优化对 strlen.

的调用

显式调用 std::string 构造函数会删除对 strlen 的调用，但不会再就地构造它们，因此这无法加快 emplace_back。

综上所述，if we leave the SSO by using long enough strings，分配成本完全淹没了这些细节，因此 emplace_back 和 push_back 具有相同的性能：

^{Click on image for quick-bench link}

如果修复 type 的构造函数以移动其参数，emplace_back 在所有情况下都会变得更快。

struct type {
  std::string a;
  std::string b;
  std::string c;

  type(std::string a, std::string b, std::string c)
    : a{std::move(a)}
    , b{std::move(b)}
    , c{std::move(c)}
  {}
};

SSO case

^{Click on image for quick-bench link}

Long case

^{Click on image for quick-bench link}

但是，SSO push_back 案例放缓；编译器似乎发出了额外的副本。

optimal version of perfect forwarding 没有这个缺点（注意纵轴上的比例变化）：

struct type {
  std::string a;
  std::string b;
  std::string c;

  template <typename A, typename B, typename C>
  type(A&& a, B&& b, C&& c)
    : a{std::forward<A>(a)}
    , b{std::forward<B>(b)}
    , c{std::forward<C>(c)}
  {}
};

^{Click on image for quick-bench link}

push_back() 和 emplace_back() 在幕后

push_back() and emplace_back() behind the scenes

c++

vector

std

move-semantics

c++11