返回 initializer_list 的 lambda 中的奇怪值

Strange values in a lambda returning initializer_list

考虑这个 C++11 代码片段:

#include <iostream>
#include <set>
#include <stdexcept>
#include <initializer_list>


int main(int argc, char ** argv)
{
    enum Switch {
        Switch_1,
        Switch_2,
        Switch_3,
        Switch_XXXX,
    };

    int foo_1 = 1;
    int foo_2 = 2;
    int foo_3 = 3;
    int foo_4 = 4;
    int foo_5 = 5;
    int foo_6 = 6;
    int foo_7 = 7;

    auto get_foos = [=] (Switch ss) -> std::initializer_list<int> {
        switch (ss) {
            case Switch_1:
                return {foo_1, foo_2, foo_3};
            case Switch_2:
                return {foo_4, foo_5};
            case Switch_3:
                return {foo_6, foo_7};
            default:
                throw std::logic_error("invalid switch");
        }
    };

    std::set<int> foos = get_foos(Switch_1);
    for (auto && foo : foos) {
        std::cout << foo << " ";
    }
    std::cout << std::endl;
    return 0;
}

无论我尝试哪种编译器,似乎都无法正确处理它。这让我觉得我做错了什么,而不是跨多个编译器的常见错误。

clang 3.5输出:

-1078533848 -1078533752 134518134

gcc 4.8.2 输出:

-1078845996 -1078845984 3

gcc 4.8.3 输出(编译于 http://www.tutorialspoint.com):

1 2 267998238

gcc(未知版本) 输出(编译于 http://coliru.stacked-crooked.com

-1785083736 0 6297428 

问题似乎是由使用 std::initializer_list<int> 作为 lambda 的 return 值引起的。将 lambda 定义更改为 [=] (Switch ss) -> std::set<int> {...} returned 值是正确的。

请帮我解开这个谜。

发件人:http://en.cppreference.com/w/cpp/utility/initializer_list

The underlying array is not guaranteed to exist after the lifetime of the original initializer list object has ended. The storage for std::initializer_list is unspecified (i.e. it could be automatic, temporary, or static read-only memory, depending on the situation).

我不认为初始化列表是可复制构造的。 std::set 和其他容器一样。基本上看起来您的代码的行为类似于 "returning a reference to a temporary".

C++14 对底层存储有一些稍微不同的说法——延长 它的 生命周期——但这并没有解决任何与 initializer_list 对象,更不用说其副本了。因此,即使在 C++14 中,问题仍然存在。

The underlying array is a temporary array, in which each element is copy-initialized (except that narrowing conversions are invalid) from the corresponding element of the original initializer list. The lifetime of the underlying array is the same as any other temporary object, except that initializing an initializer_list object from the array extends the lifetime of the array exactly like binding a reference to a temporary (with the same exceptions, such as for initializing a non-static class member). The underlying array may be allocated in read-only memory.

问题是您正在引用一个不再存在的对象,因此您正在调用 undefined behavior. initializer_list seems underspecified in the C++11 draft standard,没有规范部分实际指定此行为。尽管有很多注释表明这行不通,而且一般来说,尽管注释如果不与规范性文本冲突则不是规范性的,但它们具有很强的指示性。

如果我们转到 18.9 初始化程序列表 部分,它有一条注释:

Copying an initializer list does not copy the underlying elements.

8.5.4 部分中,我们有以下示例:

typedef std::complex<double> cmplx;
std::vector<cmplx> v1 = { 1, 2, 3 };

void f() {
    std::vector<cmplx> v2{ 1, 2, 3 };
    std::initializer_list<int> i3 = { 1, 2, 3 };
}

附有以下注释:

For v1 and v2, the initializer_list object and array created for { 1, 2, 3 } have full-expression lifetime. For i3, the initializer_list object and array have automatic lifetime.

这些注释与给出以下示例的 initializer_list proposal: N2215 一致:

std::vector<double> v = {1, 2, 3.14};

并说:

Now add vector(initializer_list<E>) to vector<E> as shown above. Now, the example works. The initializer list {1, 2, 3.14} is interpreted as a temporary constructed like this:

const double temp[] = {double(1), double(2), 3.14 } ;
initializer_list<double> tmp(temp,
sizeof(temp)/sizeof(double));
vector<double> v(tmp);

[...]

Note that an initializer_list is a small object (probably two words), so passing it by value makes sense. Passing by value also simplifies inlining of begin() and end() and constant expression evaluation of size().

An initializer_list s will be created by the compiler, but can be copied by users. Think of it as a pair of pointers.

本例中的 initializer_list 仅保存指向自动变量的指针,该变量在退出作用域后将不存在。

更新

我刚刚意识到提案实际上指出了这个误用场景:

One implication is that an initializer_list is “ pointer like” in that it behaves like a pointer in respect to the underlying array. For example:

int * f(int a)
{ 
   int* p = &a;
   return p; //bug waiting to happen
}

initializer_list<int> g(int a, int b, int c)
{
   initializer_list<int> v = { a, b, c };
   return v; // bug waiting to happen
} 

It actually takes a minor amount of ingenuity to misuse an initializer_list this way. In particular, variables of type initializer_list are going to be rare.

我觉得最后一句话(强调我的)特别讽刺。

更新 2

因此 defect report 1290 修复了规范性措辞,因此它现在涵盖了此行为,尽管复制案例可能更明确。它说:

A question has arisen over expected behavior when an initializer_list is a non-static data member of a class. Initialization of an initializer_list is defined in terms of construction from an implicitly allocated array whose lifetime "is the same as that of the initializer_list object". That would mean that the array needs to live as long as the initializer_list does, which would on the face of it appear to require the array to be stored in something like a std::unique_ptr within the same class (if the member is initialized in this manner).

It would be surprising if that was the intent, but it would make initializer_list usable in this context.

决议修正了措辞,我们可以在N3485 version of the draft standard中找到新的措辞。所以 8.5.4 [dcl.init.list] 部分现在说:

The array has the same lifetime as any other temporary object (12.2), except that initializing an initializer_- list object from the array extends the lifetime of the array exactly like binding a reference to a temporary.

12.2[class.temporary]说:

The lifetime of a temporary bound to the returned value in a function return statement (6.6.3) is not extended; the temporary is destroyed at the end of the full-expression in the return statement.

因此,当 initializer_list 本身被复制或移动到 copy/move 的结果时,它们不会延长其引用数组的生命周期。这使得返回它们成为问题。 (它们确实将引用数组的生命周期延长到它们自己的生命周期,但这种延长对省略或列表副本不具有传递性)。

要解决此问题,存储数据并手动管理其生命周期:

template<size_t size, class T>
std::array<T, size> partial_array( T const* begin, T const* end ) {
  std::array<T, size> retval;
  size_t delta = (std::min)( size, end-begin );
  end = begin+delta;
  std::copy( begin, end, retval.begin() );
  return retval;
}
template<class T, size_t max_size>
struct capped_array {
  std::array<T, max_size> storage;
  size_t used = 0;
  template<size_t osize, class=std::enable_if_t< (size<=max_size) >>
  capped_array( std::array<T, osize> const& rhs ):
    capped_array( rhs.data(), rhs.data()+osize )
  {}
  template<size_t osize, class=std::enable_if_t< (size<=max_size) >>
  capped_array( capped_array<T, osize> const& rhs ):
    capped_array( rhs.data(), rhs.data()+rhs.used )
  {}
  capped_array(capped_array const& o)=default;
  capped_array(capped_array & o)=default;
  capped_array(capped_array && o)=default;
  capped_array(capped_array const&& o)=default;
  capped_array& operator=(capped_array const& o)=default;
  capped_array& operator=(capped_array & o)=default;
  capped_array& operator=(capped_array && o)=default;
  capped_array& operator=(capped_array const&& o)=default;

  // finish-start MUST be less than max_size, or we will truncate
  capped_array( T const* start, T const* finish ):
    storage( partial_array(start, finish) ),
    used((std::min)(finish-start, size))
  {}
  T* begin() { return storage.data(); }
  T* end() { return storage.data()+used; }
  T const* begin() const { return storage.data(); }
  T const* end() const { return storage.data()+used; }
  size_t size() const { return used; }
  bool empty() const { return !used; }
  T& front() { return *begin(); }
  T const& front() const { return *begin(); }
  T& back() { return *std::prev(end()); }
  T const& back() const { return *std::prev(end()); }

  capped_array( std::initializer_list<T> il ):
    capped_array(il.begin(), il.end() )
  {}
};

这里的目标很简单。创建一个基于堆栈的数据类型,用于存储一堆 T,最多可达一个上限,并且可以处理更少的数据。

现在我们将您的 std::initializer_list 替换为:

auto get_foos = [=] (Switch ss) -> capped_array<int,3> {
    switch (ss) {
        case Switch_1:
            return {foo_1, foo_2, foo_3};
        case Switch_2:
            return {foo_4, foo_5};
        case Switch_3:
            return {foo_6, foo_7};
        default:
            throw std::logic_error("invalid switch");
    }
};

并且您的代码有效。未使用空闲存储区(未分配堆)。

更高级的版本会使用一组未初始化的数据并手动构造每个 T.