我不应该看到单线程和多线程 websocketpp 服务器在 CPU 用法上的区别吗？

Question

我正在使用这样配置的多线程 websocketpp 服务器：

Server::Server(int ep) {
    using websocketpp::lib::placeholders::_1;
    using websocketpp::lib::placeholders::_2;
    using websocketpp::lib::bind;

    Server::wspp_server.clear_access_channels(websocketpp::log::alevel::all);

    Server::wspp_server.init_asio();

    Server::wspp_server.set_open_handler(bind(&Server::on_open, this, _1));;
    Server::wspp_server.set_close_handler(bind(&Server::on_close, this, _1));
    Server::wspp_server.set_message_handler(bind(&Server::on_message, this, _1, _2));

    try {
        Server::wspp_server.listen(ep);
    } catch (const websocketpp::exception &e){
        std::cout << "Error in Server::Server(int): " << e.what() << std::endl;
    }
    Server::wspp_server.start_accept();
}

void Server::run(int threadCount) {
    boost::thread_group tg;

    for (int i = 0; i < threadCount; i++) {
        tg.add_thread(new boost::thread(
            &websocketpp::server<websocketpp::config::asio>::run,
            &Server::wspp_server));
        std::cout << "Spawning thread " << (i + 1) << std::endl;
    }

    tg.join_all();
}

void Server::updateClients() {
    /*
       run updates
    */
    for (websocketpp::connection_hdl hdl : Server::conns) {
        try {
            std::string message = "personalized message for this client from the ran update above";
            wspp_server.send(hdl, message, websocketpp::frame::opcode::text);
        } catch (const websocketpp::exception &e) {
            std::cout << "Error in Server::updateClients(): " << e.what() << std::endl;
        }
    }
}

void Server::on_open(websocketpp::connection_hdl hdl) {
    boost::lock_guard<boost::shared_mutex> lock(Server::conns_mutex);
    Server::conns.insert(hdl);

    //do stuff


    //when the first client connects, start the update routine
    if (conns.size() == 1) {
        Server::run = true;
        bool *run = &(Server::run);
        std::thread([run] () {
            while (*run) {
                auto nextTime = std::chrono::steady_clock::now() + std::chrono::milliseconds(15);
                Server::updateClients();
                std::this_thread::sleep_until(nextTime);
            }
        }).detach();
    }
}

void Server::on_close(websocketpp::connection_hdl hdl) {
    boost::lock_guard<boost::shared_mutex> lock(Server::conns_mutex);
    Server::conns.erase(hdl);

    //do stuff

    //stop the update loop when all clients are gone
    if (conns.size() < 1)
        Server::run = false;
}

void Server::on_message(
        websocketpp::connection_hdl hdl,
        websocketpp::server<websocketpp::config::asio>::message_ptr msg) {
    boost::lock_guard<boost::shared_mutex> lock(Server::conns_mutex);

    //do stuff
}

我用以下命令启动服务器：

int port = 9000;
Server server(port);
server.run(/* number of threads */);

添加连接时的唯一实质性区别在于消息发送 [wssp.send(...)]。越来越多的客户并没有真正增加任何内部计算。增加的只是要发出的消息量。

我的问题是，无论我使用 1 个还是多个线程，CPU 用法似乎没有太大区别。

我用 server.run(1) 或 server.run(4) 启动服务器并不重要（都在 4 核 CPU 专用服务器上）。对于相似的负载，CPU 使用率图表显示大致相同的百分比。我期望并行使用 4 个线程运行时使用率会降低。我是不是想错了？

在某些时候，我感觉到并行性确实更适用于听力部分而不是发射。因此，我尝试将 send 包含在一个新线程（我分离）中，因此它独立于需要它的序列，但它并没有改变图表上的任何内容。

难道我不应该看到 CPU 创作的作品有什么不同吗？否则，我做错了什么？为了强制从不同线程发出消息，我还缺少另一个步骤吗？

Answer 1

"My problem is that the CPU usage doesn't seem to be that much different whether I use 1 or more threads."

这不是问题。这是事实。这只是意味着整个事情不受 CPU 约束。这应该很明显，因为它是网络 IO。事实上，出于这个原因，高性能服务器通常只将 1 个线程专用于所有 IO 任务。

"I was expecting the usage to be lower with 4 threads running in parallel. Am I thinking of this the wrong way?"

是的，好像是。如果您以 4 种方式分摊账单，您也不会期望支付更少。

事实上，就像在晚餐时一样，由于分担负载的开销 (cost/tasks)，您通常最终要多付。除非你需要比单个线程可以交付的 CPU capacity/lower 反应时间多，否则单个 IO 线程（显然）效率更高，因为没有调度开销 and/or上下文切换惩罚。

另一个脑力练习：

如果您运行 100 个线程，在最佳情况下，处理器将在您的可用内核中调度它们
同样，如果您的系统上有其他进程运行ning（很明显，它们总是存在），那么处理器可能会在同一个逻辑核心上安排您的 4 个线程。您希望 CPU 负载更低吗？为什么？（提示：当然不是）。

我不应该看到单线程和多线程 websocketpp 服务器在 CPU 用法上的区别吗？

Shouldn't I see a difference in CPU usage between a single-threaded vs a multi-threaded websocketpp server?

c++

multithreading

boost-asio

websocket++

背景：What is the difference between concurrency, parallelism and asynchronous methods?