如何使用Rayon并行计算PI

How to use Rayon for parallel calculation of PI

我的代码可以在没有 Rayon 的情况下使用 Ramanujan 公式计算 Pi,我想为并行线程实现 Rayon,因为这是我的项目。

我知道我需要用这个

use rayon::prelude::*;
fn sum_of_squares(input: &[f64]) ->f64 {
    for i in total.iter() // <-- just change that!
         .map(|&i| i * i)
         .sum()
}

但我还是不知道该怎么办。

这是我的代码

use rayon::prelude::*;

pub fn factorial(n: f64) -> f64 {
    if n == 0.0 {
        return 1.00;
    } else {
        let result: f64 = factorial(n - 1.0) * n;

        return result;
    }
}

pub fn estimate_pi() -> f64 {
    let mut total: f64 = 0.0;
    let mut k: f64 = 0.0;
    let factor1: f64 = 2.0;
    let factor: f64 = (factor1.sqrt() * 2.0) / 9801.0;

    loop {
        let num: f64 = factorial(4.0 * k) * (1103.0 + (26390.0 * k));
        let numm: f64 = 396.0;
        let den: f64 = factorial(k).powf(4.0) * numm.powf(4.0 * k);
        let term: f64 = factor * num / den;

        total += term;

        if term.abs() < 1e-15 {
            break;
        }

        k += 1.0;
    }

    return 1.0 / total;
}

fn main() {
    println!("{}", estimate_pi());
}

Playground

第一步是通过使每次迭代独立,使您的算法可并行化。我做的第一件事是添加调试语句来打印 k:

的最终值
        k += 1.0;
    }

    dbg!(k);

    return 1.0 / total;

打印了 k = 2,因此我可以使用它为每次迭代创建独立的 k 值范围:

(0..=iterations) // [0, 1, 2] for iterations = 2

我们将迭代该范围内的元素,而不是使用您拥有的 epsilon 检查:

pub fn estimate_pi(iterations: usize) -> f64 {
    let mut total: f64 = 0.0;
    let factor1: f64 = 2.0;
    let factor: f64 = (factor1.sqrt() * 2.0) / 9801.0;

    for i in 0..=iterations {
        let k: f64 = i as f64;

        let num: f64 = factorial(4.0 * k) * (1103.0 + (26390.0 * k));
        let numm: f64 = 396.0;
        let den: f64 = factorial(k).powf(4.0) * numm.powf(4.0 * k);
        let term: f64 = factor * num / den;

        total += term;
    }

    return 1.0 / total;
}

// call estimate_pi(2)

Total 只是所有迭代的总和,因此我们可以将其从循环转换为 map-reduce 操作。对于范围内的每个数字,我们计算 term。然后,我们使用fold(减少)来计算总和。

pub fn estimate_pi(iterations: usize) -> f64 {
    let factor1: f64 = 2.0;
    let factor: f64 = (factor1.sqrt() * 2.0) / 9801.0;

    let sum: f64 = (0..=iterations).into_iter().map(|i| {
        let k: f64 = i as f64;

        let num: f64 = factorial(4.0 * k) * (1103.0 + (26390.0 * k));
        let numm: f64 = 396.0;
        let den: f64 = factorial(k).powf(4.0) * numm.powf(4.0 * k);
        let term: f64 = factor * num / den;

        term
    }).fold(0.0, |a, b| a + b);

    return 1.0 / sum;
}

现在我们可以使用rayon的方法将其转换为并行操作。将 into_iter() 替换为 into_par_iter(),将 fold(0.0, |a, b| a + b) 替换为 reduce(|| 0.0, |a, b| a + b):

pub fn estimate_pi(iterations: usize) -> f64 {
    let factor1: f64 = 2.0;
    let factor: f64 = (factor1.sqrt() * 2.0) / 9801.0;

    // map is now a parallel map, and reduce is a parallel reduce
    let sum: f64 = (0..=iterations).into_par_iter().map(|i| {
        let k: f64 = i as f64;

        let num: f64 = factorial(4.0 * k) * (1103.0 + (26390.0 * k));
        let numm: f64 = 396.0;
        let den: f64 = factorial(k).powf(4.0) * numm.powf(4.0 * k);
        let term: f64 = factor * num / den;

        term
    }).reduce(|| 0.0, |a, b| a + b);

    return 1.0 / sum;
}

现在稍微清理一下代码,使其更加地道:

  • 在适当的地方删除显式输入
  • 使用隐式returns
  • 对 sqrt(2) 使用常量
  • 更有意义的变量名
  • 在表达式中嵌入396
use std::f64::consts::*;

pub fn estimate_pi(iterations: usize) -> f64 {
    let factor = (SQRT_2 * 2.0) / 9801.0;

    let sum = (0..=iterations).into_par_iter().map(|i| {
        let k = i as f64;

        let numerator = factorial(4.0 * k) * (1103.0 + (26390.0 * k));
        let denominator = factorial(k).powf(4.0) * (396_f64).powf(4.0 * k);

        factor * numerator / denominator
    }).reduce(|| 0.0, |a, b| a + b);

    1.0 / sum
}

作为最后一步,我们也可以使 factorial 并行:

// now have to call this with a `usize`
pub fn factorial(n: usize) -> f64 {
    let out = (1..=n).into_par_iter().reduce(|| 1, |a, b| a * b);
    out as f64
}

pub fn estimate_pi(iterations: usize) -> f64 {
    let factor = (SQRT_2 * 2.0) / 9801.0;

    let sum = (0..=iterations).into_par_iter().map(|i| {
        let k = i as f64;

        // notice we now pass the `i: usize` in here
        let numerator = factorial(4 * i) * (1103.0 + (26390.0 * k));
        let denominator = factorial(i).powf(4.0) * (396_f64).powf(4.0 * k);

        factor * numerator / denominator
    }).reduce(|| 0.0, |a, b| a + b);

    1.0 / sum
}

最终代码

use rayon::prelude::*;
use std::f64::consts::*;

pub fn factorial(n: usize) -> f64 {
    let out = (1..=n).into_par_iter().reduce(|| 1, |a, b| a * b);
    out as f64
}

pub fn estimate_pi(iterations: usize) -> f64 {
    let factor = (SQRT_2 * 2.0) / 9801.0;

    let sum = (0..=iterations).into_par_iter().map(|i| {
        let k = i as f64;

        let numerator = factorial(4 * i) * (1103.0 + (26390.0 * k));
        let denominator = factorial(i).powf(4.0) * (396_f64).powf(4.0 * k);

        factor * numerator / denominator
    }).reduce(|| 0.0, |a, b| a + b);

    1.0 / sum
}

fn main() {
    // our algorithm results in the same value as the constant
    println!("pi_a: {:.60}", estimate_pi(2));
    println!("pi_c: {:.60}", PI);
}

输出

pi_a: 3.141592653589793115997963468544185161590576171875000000000000
pi_c: 3.141592653589793115997963468544185161590576171875000000000000

Playground

建议

您应该对具有不同并行度的不同版本进行基准测试,以查看性能或多或少。可能是人造丝并行迭代导致性能降低,因为总迭代次数太少了。

您还可以考虑对阶乘使用查找 table,因为 n <= k * 4 <= 8:

pub fn factorial(n: usize) -> f64 {
    const TABLE: [f64; 9] = [
        1.0,     // 0!
        1.0,     // 1!
        2.0,     // 2!
        6.0,     // 3!
        24.0,    // 4!
        120.0,   // 5!
        720.0,   // 6!
        5040.0,  // 7!
        40320.0, // 8!
    ];

    TABLE[n]
}

Playground

当然,启用内联也有帮助。