如何避免线程恐慌导致的死锁？

Question

我的服务器使用 Barrier 来通知客户端何时可以安全地尝试连接。如果没有屏障，我们可能会随机失败，因为无法保证服务器套接字会被绑定。

现在假设服务器发生了混乱 - 例如试图将套接字绑定到端口 80。客户端将永远处于 wait()-ing 状态。我们不能 join() 服务器线程以查明它是否崩溃，因为 join() 是一个阻塞操作 - 如果我们 join() 我们将无法 connect()。

考虑到 std::sync API 不提供超时方法，进行这种同步的正确方法是什么？

这只是一个演示问题的 MCVE。我在单元测试中遇到了类似的情况 - 它永远运行。

use std::{
    io::prelude::*,
    net::{SocketAddr, TcpListener, TcpStream},
    sync::{Arc, Barrier},
};

fn main() {
    let port = 9090;
    //let port = 80;

    let barrier = Arc::new(Barrier::new(2));
    let server_barrier = barrier.clone();

    let client_sync = move || {
        barrier.wait();
    };

    let server_sync = Box::new(move || {
        server_barrier.wait();
    });

    server(server_sync, port);
    //server(Box::new(|| { no_sync() }), port); //use to test without synchronisation

    client(&client_sync, port);
    //client(&no_sync, port); //use to test without synchronisation
}

fn no_sync() {
    // do nothing in order to demonstrate the need for synchronization
}

fn server(sync: Box<Fn() + Send + Sync>, port: u16) {
    std::thread::spawn(move || {
        std::thread::sleep_ms(100); //there is no guarantee when the os will schedule the thread. make it 100% reproducible
        let addr = SocketAddr::from(([127, 0, 0, 1], port));
        let socket = TcpListener::bind(&addr).unwrap();
        println!("server socket bound");
        sync();

        let (mut client, _) = socket.accept().unwrap();

        client.write_all(b"hello mcve").unwrap();
    });
}

fn client(sync: &Fn(), port: u16) {
    sync();

    let addr = SocketAddr::from(([127, 0, 0, 1], port));
    let mut socket = TcpStream::connect(&addr).unwrap();
    println!("client socket connected");

    let mut buf = String::new();
    socket.read_to_string(&mut buf).unwrap();
    println!("client received: {}", buf);
}

Answer 1

我会在这里使用 Condvar 而不是 Barrier。

为了真正解决您的问题，我看到至少三种可能的解决方案：

使用 Condvar::wait_timeout 并将超时设置为合理的持续时间（例如 1 秒，这应该足以绑定到端口）
您可以使用与上述相同的方法，但超时时间更短（例如 10 毫秒）并检查 Mutex 是否中毒。
您可以使用普通 Mutex (make sure that the Mutex is locked by the other thread first) and then use Mutex::try_lock to check if the Mutex is poisoned

Condvar

我认为解决方案 1 或 2 比第三个更受欢迎，因为您将避免确保另一个线程首先锁定了 Mutex。

如何避免线程恐慌导致的死锁？

How to avoid a deadlock caused by a thread panic?

synchronization

deadlock

rust