为什么基于人造丝的并行处理比串行处理需要更多时间?

Why rayon-based parallel processing takes more time than serial processing?

学习Rayon,想比较斐波那契数列并行计算和串行计算的性能。这是我的代码:

use rayon;
use std::time::Instant;

fn main() {
    let nth = 30;
    let now = Instant::now();
    let fib = fibonacci_serial(nth);
    println!(
        "[s] The {}th number in the fibonacci sequence is {}, elapsed: {}",
        nth,
        fib,
        now.elapsed().as_micros()
    );

    let now = Instant::now();
    let fib = fibonacci_parallel(nth);
    println!(
        "[p] The {}th number in the fibonacci sequence is {}, elapsed: {}",
        nth,
        fib,
        now.elapsed().as_micros()
    );
}

fn fibonacci_parallel(n: u64) -> u64 {
    if n <= 1 {
        return n;
    }

    let (a, b) = rayon::join(|| fibonacci_parallel(n - 2), || fibonacci_parallel(n - 1));
    a + b
}

fn fibonacci_serial(n: u64) -> u64 {
    if n <= 1 {
        return n;
    }

    fibonacci_serial(n - 2) + fibonacci_serial(n - 1)
}

Run in Rust Playground

本来以为并行计算的运行时间会比串行计算的运行时间小,结果却相反:

# `s` stands for serial calculation and `p` for parallel
[s] The 30th number in the fibonacci sequence is 832040, elapsed: 12127
[p] The 30th number in the fibonacci sequence is 832040, elapsed: 990379

我对 serial/parallel 计算的实现会有缺陷。但如果不是,为什么我会看到这些结果?

我认为真正的原因是,您创建了 个不好的线程。在每次调用 fibonacci_parallel 时,您都会为人造丝创建另一对线程,并且因为您在闭包中再次调用 fibonacci_parallel,所以您会创建另一对线程。
这对于 OS/rayon 来说是非常可怕的。

解决这个问题的方法可能是这样的:

fn fibonacci_parallel(n: u64) -> u64 {
    fn inner(n: u64) -> u64 {
        if n <= 1 { 
            return n;
        }   

        inner(n - 2) + inner(n - 1)
    }   

    if n <= 1 {
        return n;
    }   

    let (a, b) = rayon::join(|| inner(n - 2), || inner(n - 1));
    a + b 
}

您创建了两个都执行内部函数的线程。加上这个我得到

op@VBOX /t/t/foo> cargo run --release 40
    Finished release [optimized] target(s) in 0.03s
     Running `target/release/foo 40`
[s] The 40th number in the fibonacci sequence is 102334155, elapsed: 1373741
[p] The 40th number in the fibonacci sequence is 102334155, elapsed: 847343

但如前所述,对于低数量并行执行是不值得的:

op@VBOX /t/t/foo> cargo run --release 20
    Finished release [optimized] target(s) in 0.02s
     Running `target/release/foo 20`
[s] The 10th number in the fibonacci sequence is 6765, elapsed: 82
[p] The 10th number in the fibonacci sequence is 6765, elapsed: 241