为什么基于人造丝的并行处理比串行处理需要更多时间?
Why rayon-based parallel processing takes more time than serial processing?
学习Rayon,想比较斐波那契数列并行计算和串行计算的性能。这是我的代码:
use rayon;
use std::time::Instant;
fn main() {
let nth = 30;
let now = Instant::now();
let fib = fibonacci_serial(nth);
println!(
"[s] The {}th number in the fibonacci sequence is {}, elapsed: {}",
nth,
fib,
now.elapsed().as_micros()
);
let now = Instant::now();
let fib = fibonacci_parallel(nth);
println!(
"[p] The {}th number in the fibonacci sequence is {}, elapsed: {}",
nth,
fib,
now.elapsed().as_micros()
);
}
fn fibonacci_parallel(n: u64) -> u64 {
if n <= 1 {
return n;
}
let (a, b) = rayon::join(|| fibonacci_parallel(n - 2), || fibonacci_parallel(n - 1));
a + b
}
fn fibonacci_serial(n: u64) -> u64 {
if n <= 1 {
return n;
}
fibonacci_serial(n - 2) + fibonacci_serial(n - 1)
}
本来以为并行计算的运行时间会比串行计算的运行时间小,结果却相反:
# `s` stands for serial calculation and `p` for parallel
[s] The 30th number in the fibonacci sequence is 832040, elapsed: 12127
[p] The 30th number in the fibonacci sequence is 832040, elapsed: 990379
我对 serial/parallel 计算的实现会有缺陷。但如果不是,为什么我会看到这些结果?
我认为真正的原因是,您创建了 n²
个不好的线程。在每次调用 fibonacci_parallel
时,您都会为人造丝创建另一对线程,并且因为您在闭包中再次调用 fibonacci_parallel
,所以您会创建另一对线程。
这对于 OS/rayon 来说是非常可怕的。
解决这个问题的方法可能是这样的:
fn fibonacci_parallel(n: u64) -> u64 {
fn inner(n: u64) -> u64 {
if n <= 1 {
return n;
}
inner(n - 2) + inner(n - 1)
}
if n <= 1 {
return n;
}
let (a, b) = rayon::join(|| inner(n - 2), || inner(n - 1));
a + b
}
您创建了两个都执行内部函数的线程。加上这个我得到
op@VBOX /t/t/foo> cargo run --release 40
Finished release [optimized] target(s) in 0.03s
Running `target/release/foo 40`
[s] The 40th number in the fibonacci sequence is 102334155, elapsed: 1373741
[p] The 40th number in the fibonacci sequence is 102334155, elapsed: 847343
但如前所述,对于低数量并行执行是不值得的:
op@VBOX /t/t/foo> cargo run --release 20
Finished release [optimized] target(s) in 0.02s
Running `target/release/foo 20`
[s] The 10th number in the fibonacci sequence is 6765, elapsed: 82
[p] The 10th number in the fibonacci sequence is 6765, elapsed: 241
学习Rayon,想比较斐波那契数列并行计算和串行计算的性能。这是我的代码:
use rayon;
use std::time::Instant;
fn main() {
let nth = 30;
let now = Instant::now();
let fib = fibonacci_serial(nth);
println!(
"[s] The {}th number in the fibonacci sequence is {}, elapsed: {}",
nth,
fib,
now.elapsed().as_micros()
);
let now = Instant::now();
let fib = fibonacci_parallel(nth);
println!(
"[p] The {}th number in the fibonacci sequence is {}, elapsed: {}",
nth,
fib,
now.elapsed().as_micros()
);
}
fn fibonacci_parallel(n: u64) -> u64 {
if n <= 1 {
return n;
}
let (a, b) = rayon::join(|| fibonacci_parallel(n - 2), || fibonacci_parallel(n - 1));
a + b
}
fn fibonacci_serial(n: u64) -> u64 {
if n <= 1 {
return n;
}
fibonacci_serial(n - 2) + fibonacci_serial(n - 1)
}
本来以为并行计算的运行时间会比串行计算的运行时间小,结果却相反:
# `s` stands for serial calculation and `p` for parallel
[s] The 30th number in the fibonacci sequence is 832040, elapsed: 12127
[p] The 30th number in the fibonacci sequence is 832040, elapsed: 990379
我对 serial/parallel 计算的实现会有缺陷。但如果不是,为什么我会看到这些结果?
我认为真正的原因是,您创建了 n²
个不好的线程。在每次调用 fibonacci_parallel
时,您都会为人造丝创建另一对线程,并且因为您在闭包中再次调用 fibonacci_parallel
,所以您会创建另一对线程。
这对于 OS/rayon 来说是非常可怕的。
解决这个问题的方法可能是这样的:
fn fibonacci_parallel(n: u64) -> u64 {
fn inner(n: u64) -> u64 {
if n <= 1 {
return n;
}
inner(n - 2) + inner(n - 1)
}
if n <= 1 {
return n;
}
let (a, b) = rayon::join(|| inner(n - 2), || inner(n - 1));
a + b
}
您创建了两个都执行内部函数的线程。加上这个我得到
op@VBOX /t/t/foo> cargo run --release 40
Finished release [optimized] target(s) in 0.03s
Running `target/release/foo 40`
[s] The 40th number in the fibonacci sequence is 102334155, elapsed: 1373741
[p] The 40th number in the fibonacci sequence is 102334155, elapsed: 847343
但如前所述,对于低数量并行执行是不值得的:
op@VBOX /t/t/foo> cargo run --release 20
Finished release [optimized] target(s) in 0.02s
Running `target/release/foo 20`
[s] The 10th number in the fibonacci sequence is 6765, elapsed: 82
[p] The 10th number in the fibonacci sequence is 6765, elapsed: 241