Rust vs Go 并发网络服务器，为什么 Rust 在这里很慢？

Question

我正在尝试 Rust 书中 multi-threaded webserver example 的一些基准测试，为了比较，我在 Go 中构建了类似的东西，运行使用 ApacheBench 的基准测试。尽管这是一个简单的例子，但差异太大了。执行相同操作的 Go Web 服务器快 10 倍。由于我期望 Rust 更快或处于同一水平，我尝试使用 futures 和 smol 进行多次修订（尽管我的目标是比较仅使用标准库的实现）但结果几乎相同。这里有人可以建议更改 Rust 实现以使其更快而不使用大量线程吗？

这是我使用的代码：https://github.com/deepu105/concurrency-benchmarks

tokio-http 版本最慢，其他 3 个 rust 版本给出几乎相同的结果

以下是基准测试：

Rust（有 8 个线程，有 100 个线程的数字更接近 Go）：

❯ ab -c 100 -n 1000 http://localhost:8080/
This is ApacheBench, Version 2.3 <$Revision: 1879490 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient)
Completed 100 requests
Completed 200 requests
Completed 300 requests
Completed 400 requests
Completed 500 requests
Completed 600 requests
Completed 700 requests
Completed 800 requests
Completed 900 requests
Completed 1000 requests
Finished 1000 requests


Server Software:        
Server Hostname:        localhost
Server Port:            8080

Document Path:          /
Document Length:        176 bytes

Concurrency Level:      100
Time taken for tests:   26.027 seconds
Complete requests:      1000
Failed requests:        0
Total transferred:      195000 bytes
HTML transferred:       176000 bytes
Requests per second:    38.42 [#/sec] (mean)
Time per request:       2602.703 [ms] (mean)
Time per request:       26.027 [ms] (mean, across all concurrent requests)
Transfer rate:          7.32 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    2   2.9      1      16
Processing:     4 2304 1082.5   2001    5996
Waiting:        0 2303 1082.7   2001    5996
Total:          4 2307 1082.1   2002    5997

Percentage of the requests served within a certain time (ms)
  50%   2002
  66%   2008
  75%   2018
  80%   3984
  90%   3997
  95%   4002
  98%   4005
  99%   5983
 100%   5997 (longest request)

开始：

ab -c 100 -n 1000 http://localhost:8080/
This is ApacheBench, Version 2.3 <$Revision: 1879490 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient)
Completed 100 requests
Completed 200 requests
Completed 300 requests
Completed 400 requests
Completed 500 requests
Completed 600 requests
Completed 700 requests
Completed 800 requests
Completed 900 requests
Completed 1000 requests
Finished 1000 requests


Server Software:        
Server Hostname:        localhost
Server Port:            8080

Document Path:          /
Document Length:        174 bytes

Concurrency Level:      100
Time taken for tests:   2.102 seconds
Complete requests:      1000
Failed requests:        0
Total transferred:      291000 bytes
HTML transferred:       174000 bytes
Requests per second:    475.84 [#/sec] (mean)
Time per request:       210.156 [ms] (mean)
Time per request:       2.102 [ms] (mean, across all concurrent requests)
Transfer rate:          135.22 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    2   1.4      2       5
Processing:     0  203 599.8      3    2008
Waiting:        0  202 600.0      2    2008
Total:          0  205 599.8      5    2013

Percentage of the requests served within a certain time (ms)
  50%      5
  66%      7
  75%      8
  80%      8
  90%   2000
  95%   2003
  98%   2005
  99%   2010
 100%   2013 (longest request)

Answer 1

我只比较了你的“rustws”和 Go 版本。在 Go 中，你有无限的 goroutines（即使你将它们全部限制为只有一个 CPU 核心），而在 rustws 中，你创建了一个具有 8 个线程的线程池。

由于您的请求处理程序每 10 个请求休眠 2 秒，您将 rustws 版本限制为每秒 80/2 = 40 个请求，这就是您在 ab 结果中看到的。 Go 不会受到这种任意瓶颈的影响，因此它会向您显示它在单个 CPU 核心上处理的最大蜡烛数。

Answer 2

我终于能够使用 async_std lib

在 Rust 中获得类似的结果

❯ ab -c 100 -n 1000 http://localhost:8080/
This is ApacheBench, Version 2.3 <$Revision: 1879490 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient)
Completed 100 requests
Completed 200 requests
Completed 300 requests
Completed 400 requests
Completed 500 requests
Completed 600 requests
Completed 700 requests
Completed 800 requests
Completed 900 requests
Completed 1000 requests
Finished 1000 requests


Server Software:        
Server Hostname:        localhost
Server Port:            8080

Document Path:          /
Document Length:        176 bytes

Concurrency Level:      100
Time taken for tests:   2.094 seconds
Complete requests:      1000
Failed requests:        0
Total transferred:      195000 bytes
HTML transferred:       176000 bytes
Requests per second:    477.47 [#/sec] (mean)
Time per request:       209.439 [ms] (mean)
Time per request:       2.094 [ms] (mean, across all concurrent requests)
Transfer rate:          90.92 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    2   1.7      2       7
Processing:     0  202 599.7      2    2002
Waiting:        0  201 600.1      1    2002
Total:          0  205 599.7      5    2007

Percentage of the requests served within a certain time (ms)
  50%      5
  66%      6
  75%      9
  80%      9
  90%   2000
  95%   2003
  98%   2004
  99%   2006
 100%   2007 (longest request)

这是实现

use async_std::net::TcpListener;
use async_std::net::TcpStream;
use async_std::prelude::*;
use async_std::task;
use std::fs;
use std::time::Duration;

#[async_std::main]
async fn main() {
    let mut count = 0;

    let listener = TcpListener::bind("127.0.0.1:8080").await.unwrap(); // set listen port

    loop {
        count = count + 1;
        let count_n = Box::new(count);
        let (stream, _) = listener.accept().await.unwrap();
        task::spawn(handle_connection(stream, count_n)); // spawn a new task to handle the connection
    }
}

async fn handle_connection(mut stream: TcpStream, count: Box<i64>) {
    // Read the first 1024 bytes of data from the stream
    let mut buffer = [0; 1024];
    stream.read(&mut buffer).await.unwrap();

    // add 2 second delay to every 10th request
    if (*count % 10) == 0 {
        println!("Adding delay. Count: {}", count);
        task::sleep(Duration::from_secs(2)).await;
    }

    let contents = fs::read_to_string("hello.html").unwrap(); // read html file

    let response = format!("{}{}", "HTTP/1.1 200 OK\r\n\r\n", contents);
    stream.write(response.as_bytes()).await.unwrap(); // write response
    stream.flush().await.unwrap();
}

Rust vs Go 并发网络服务器，为什么 Rust 在这里很慢？

Rust vs Go concurrent webserver, why is Rust slow here?

concurrency

webserver

rust