用 clang 编译的并发程序运行良好,但用 gcc 挂起

Concurrent program compiled with clang runs fine, but hangs with gcc

我写了一个 class 来在大量线程之间共享有限数量的资源(例如网络接口)。资源被汇集起来,如果不使用,它们被借给请求线程,否则等待 condition_variable。 没什么特别的:除了花哨的 scoped_lock 需要 c++17,它应该是很好的旧 c++11.

gcc10.2 和 clang11 都可以很好地编译测试主程序,但是虽然后者生成的可执行文件几乎符合预期,但前者在没有消耗 CPU(死锁?)的情况下挂起。

https://godbolt.org/ 的帮助下,我尝试了旧版本的 gcc 和 icc(传递选项 -O3 -std=c++17 -pthread),所有这些都重现了错误的结果,而即使在那里 clang 也确认了正确的行为。

我想知道我是否犯了错误,或者代码是否触发了一些编译器的错误行为,以及如何解决这个问题。

#include <iostream>
#include <vector>
#include <stdexcept>
#include <mutex>
#include <condition_variable>

template <typename T>
class Pool {
///////////////////////////
  class Borrowed {
    friend class Pool<T>;
    Pool<T>& pool;
    const size_t id;
    T * val;

    public:
    Borrowed(Pool & p, size_t i, T& v): pool(p), id(i), val(&v) {}
    ~Borrowed() { release(); }
  
    T& get() const {
      if (!val) throw std::runtime_error("Borrowed::get() this resource was collected back by the pool");
      return *val;
    }

    void release() { pool.collect(*this); }
  };
///////////////////////////    
  struct Resource {
    T val;
    bool available = true;
    Resource(T v): val(std::move(v)) {}
  };
///////////////////////////

  std::vector<Resource> vres;
  size_t hint = 0;

  std::condition_variable cv;
  std::mutex mtx;
  size_t available_cnt;

  public:

  Pool(std::initializer_list<T> l): available_cnt(l.size()) {
    vres.reserve(l.size());
    for (T t: l) {
      vres.emplace_back(std::move(t));
    }
std::cout << "Pool has size " << vres.size() << std::endl;
  }

  ~Pool() {
    for ( auto & res: vres ) {
      if ( ! res.available ) {
        std::cerr << "WARNING Pool::~Pool resources are still in use\n";
      }
    }
  }

  Borrowed borrow() {
    std::unique_lock<std::mutex> lk(mtx);
    cv.wait(lk, [&](){return available_cnt > 0;});
    if ( vres[hint].available ) {
      // quick path, if hint points to an available resource
std::cout << "hint good" << std::endl;
      vres[hint].available = false;
      --available_cnt;
      Borrowed b(*this, hint, vres[hint].val);
      if ( hint + 1 < vres.size() ) ++hint;
      return b; // <--- gcc seems to hang here
    } else {
      // full scan to find the available resource
std::cout << "hint bad" << std::endl;
      for ( hint = 0; hint < vres.size(); ++hint ) {
        if ( vres[hint].available ) {
          vres[hint].available = false;
          --available_cnt;
          return Borrowed(*this, hint, vres[hint].val);
        }
      }
    }
    throw std::runtime_error("Pool::borrow() no resource is available - internal logic error");
  }

  void collect(Borrowed & b) {
    if ( &(b.pool) != this ) 
      throw std::runtime_error("Pool::collect() trying to collect resource owned by another pool!");
    if ( b.val ) {
      b.val = nullptr;
      {
        std::scoped_lock<std::mutex> lk(mtx);
        hint = b.id;
        vres[hint].available = true;
        ++available_cnt;
      }
      cv.notify_one();
    }
  }
};

///////////////////////////////////////////////////////////////////

#include <thread>
#include <chrono>

int main() {
  Pool<std::string> pool{"hello","world"};

  std::vector<std::thread> vt;
  for (int i = 10; i > 0; --i) {
    vt.emplace_back( [&pool, i]()
      { 
        auto res = pool.borrow();
        std::this_thread::sleep_for(std::chrono::milliseconds(i*300));
        std::cout << res.get() << std::endl;
      }
    );
  }

  for (auto & t: vt) t.join();

  return 0;
}

您 运行 陷入了未定义的行为,因为您有效地重新锁定了一个已经获得的锁。使用 MSVC,我获得了一个有用的调用堆栈来区分这一点。这是一个有效的固定示例(我想,现在对我有用,请参阅 borrow() 方法中的更改,可能会进一步重新设计,因为锁定在析构函数中可能会受到质疑):

#include <iostream>
#include <vector>
#include <stdexcept>
#include <mutex>
#include <condition_variable>

template <typename T>
class Pool {
  ///////////////////////////
  class Borrowed {
    friend class Pool<T>;
    Pool<T>& pool;
    const size_t id;
    T * val;

  public:
    Borrowed(Pool & p, size_t i, T& v) : pool(p), id(i), val(&v) {}
    ~Borrowed() { release(); }

    T& get() const {
      if (!val) throw std::runtime_error("Borrowed::get() this resource was collected back by the pool");
      return *val;
    }

    void release() { pool.collect(*this); }
  };
  ///////////////////////////    
  struct Resource {
    T val;
    bool available = true;
    Resource(T v) : val(std::move(v)) {}
  };
  ///////////////////////////

  std::vector<Resource> vres;
  size_t hint = 0;

  std::condition_variable cv;
  std::mutex mtx;
  size_t available_cnt;

public:

  Pool(std::initializer_list<T> l) : available_cnt(l.size()) {
    vres.reserve(l.size());
    for (T t : l) {
      vres.emplace_back(std::move(t));
    }
    std::cout << "Pool has size " << vres.size() << std::endl;
  }

  ~Pool() {
    for (auto & res : vres) {
      if (!res.available) {
        std::cerr << "WARNING Pool::~Pool resources are still in use\n";
      }
    }
  }

  Borrowed borrow() {
    
    std::unique_lock<std::mutex> lk(mtx);
    while (available_cnt == 0) cv.wait(lk);

    if (vres[hint].available) {
      // quick path, if hint points to an available resource
      std::cout << "hint good" << std::endl;
      vres[hint].available = false;
      --available_cnt;
      Borrowed b(*this, hint, vres[hint].val);
      if (hint + 1 < vres.size()) ++hint;
      lk.unlock();
      return b; // <--- gcc seems to hang here
    }
    else {
      // full scan to find the available resource
      std::cout << "hint bad" << std::endl;
      for (hint = 0; hint < vres.size(); ++hint) {
        if (vres[hint].available) {
          vres[hint].available = false;
          --available_cnt;
          lk.unlock();
          return Borrowed(*this, hint, vres[hint].val);
        }
      }
    }
    throw std::runtime_error("Pool::borrow() no resource is available - internal logic error");
  }

  void collect(Borrowed & b) {
    if (&(b.pool) != this)
      throw std::runtime_error("Pool::collect() trying to collect resource owned by another pool!");
    if (b.val) {
      b.val = nullptr;
      {
        std::scoped_lock<std::mutex> lk(mtx);
        hint = b.id;
        vres[hint].available = true;
        ++available_cnt;
        cv.notify_one();
      }
    }
  }
};

///////////////////////////////////////////////////////////////////

#include <thread>
#include <chrono>



////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
int main()
{
  try
  {
    Pool<std::string> pool{ "hello","world" };

    std::vector<std::thread> vt;
    for (int i = 10; i > 0; --i) {
      vt.emplace_back([&pool, i]()
        {
          auto res = pool.borrow();
          std::this_thread::sleep_for(std::chrono::milliseconds(i * 300));
          std::cout << res.get() << std::endl;
        }
      );
    }

    for (auto & t : vt) t.join();

    return 0;
  }
  catch(const std::exception& e)
  {
    std::cout << "exception occurred: " << e.what();
  }
  return 0;
}

锁定析构函数加上错过的 NRVO 导致了这个问题(感谢 Secundi 在评论中指出了这一点)。

如果编译器跳过NRVO,下面几行if会调用b的析构函数。析构函数试图在 unique_lock 释放互斥量之前获取互斥量,导致死锁。

Borrowed b(*this, hint, vres[hint].val);
if ( hint + 1 < vres.size() ) ++hint;
return b; // <--- gcc seems to hang here

这里至关重要的是避免破坏b。事实上,即使在返回前手动释放 unique_lock 可以避免死锁, b 的析构函数会将池资源标记为可用,而这只是被借用,导致代码错误。

可能的解决方法是将上面的行替换为:

const auto tmp = hint;
if ( hint + 1 < vres.size() ) ++hint;
return Borrowed(*this, tmp, vres[tmp].val);

另一种可能性(不排除前者)是删除Borrowed的(邪恶的)复制ctor,只提供一个move ctor:

Borrowed(const Borrowed &) = delete;
Borrowed(Borrowed && b): pool(b.pool), id(b.id), val(b.val) { b.val = nullptr; }