我的编译器是否忽略了我未使用的静态 thread_local class 成员？

Question

我想在我的 class 中进行一些线程注册，所以我决定为 thread_local 功能添加一个检查：

#include <iostream>
#include <thread>

class Foo {
 public:
  Foo() {
    std::cout << "Foo()" << std::endl;
  }
  ~Foo() {
    std::cout << "~Foo()" << std::endl;
  }
};

class Bar {
 public:
  Bar() {
    std::cout << "Bar()" << std::endl;
    //foo;
  }
  ~Bar() {
    std::cout << "~Bar()" << std::endl;
  }
 private:
  static thread_local Foo foo;
};

thread_local Foo Bar::foo;

void worker() {
  {
    std::cout << "enter block" << std::endl;
    Bar bar1;
    Bar bar2;
    std::cout << "exit block" << std::endl;
  }
}

int main() {
  std::thread t1(worker);
  std::thread t2(worker);
  t1.join();
  t2.join();
  std::cout << "thread died" << std::endl;
}

代码很简单。我的 Bar class 有一个静态 thread_local 成员 foo。如果创建了static thread_local Foo foo，则表示创建了一个线程。

但是当我运行代码时，Foo() 中没有任何内容打印，如果我删除 Bar 的构造函数中的注释，它使用 foo，代码工作正常。

我在 GCC (7.4.0) 和 Clang (6.0.0) 上试过了，结果是一样的。我猜编译器发现 foo 未被使用并且不生成实例。所以

编译器是否忽略了 static thread_local 成员？我怎么能够为此调试？
如果有，为什么普通static会员没有这个问题？

Answer 1

你的观察没有问题。 [basic.stc.static]/2禁止删除静态存储时长的变量：

If a variable with static storage duration has initialization or a destructor with side effects, it shall not be eliminated even if it appears to be unused, except that a class object or its copy/move may be eliminated as specified in [class.copy].

其他存储期限不存在此限制。事实上，[basic.stc.thread]/2 表示：

A variable with thread storage duration shall be initialized before its first odr-use and, if constructed, shall be destroyed on thread exit.

这表明除非使用 odr，否则不需要构造具有线程存储持续时间的变量。

但为什么会出现这种差异？

对于静态存储持续时间，每个程序只有一个变量实例。其构造的副作用可能很大（有点像程序范围的构造函数），因此需要副作用。

但是对于线程局部存储duration，有一个问题：一个算法可能会启动很多线程。对于这些线程中的大多数，变量是完全不相关的。如果调用 std::reduce(std::execution::par_unseq, first, last) 的外部物理模拟库最终创建了很多 foo 个实例，那将很有趣，对吧？

当然，对于构造未被 ODR 使用的线程本地存储持续时间变量（例如，线程跟踪器）的副作用，可能存在合法用途。但是，保证的优势不足以弥补上述缺点，因此只要不使用 odr，就允许消除这些变量。（不过，您的编译器可以选择不这样做。您也可以围绕 std::thread 创建自己的包装器来处理这个问题。）

Answer 2

我在“ELF Handling For Thread-Local Storage”中找到了这条信息，可以证明@L.F。的回答

In addition the run-time support should avoid creating the thread-local storage if it is not necessary. For instance, a loaded module might only be used by one thread of the many which make up the process. It would be a waste of memory and time to allocate the storage for all threads. A lazy method is wanted. This is not much extra burden since the requirement to handle dynamically loaded objects already requires recognizing storage which is not yet allocated. This is the only alternative to stopping all threads and allocating storage for all threads before letting them run again.

我的编译器是否忽略了我未使用的静态 thread_local class 成员？

Did my compiler ignore my unused static thread_local class member?

c++

compiler-optimization

thread-local-storage