如何使用 Rayon 的平行折叠创建 HashMap?

How can I create a HashMap using Rayon's parallel fold?

我正在尝试创建一个 HashMap 使用函数式编程并利用 rayon 的并行化。

如果我在没有 rayon 的情况下尝试这样做,它会起作用:

use std::collections::HashMap;

fn main() {
    let nums = [1, 2, 1, 2, 1, 2];
    let result: HashMap<i32, i32> =
        nums.iter()
            .filter(|x| *x % 2 == 0)
            .fold(HashMap::new(), |mut acc, x| {
                *acc.entry(*x).or_insert(0) += 1;
                acc
            });

    println!("{:?}", result);
}

如果我尝试通过从 iter() 切换到 par_iter() 来使用多核,我会得到一个错误:

use rayon::prelude::*; // 1.5.1
use std::collections::HashMap;

fn main() {
    let nums = [1, 2, 1, 2, 1, 2];
    let result: HashMap<i32, i32> =
        nums.par_iter()
            .filter(|x| *x % 2 == 0)
            .fold(HashMap::new(), |mut acc, x| {
                *acc.entry(*x).or_insert(0) += 1;
                acc
            });

    println!("{:?}", result);
}
error[E0277]: expected a `Fn<()>` closure, found `HashMap<_, _>`
 --> src/main.rs:9:19
  |
9 |             .fold(HashMap::new(), |mut acc, x| {
  |                   ^^^^^^^^^^^^^^ expected an `Fn<()>` closure, found `HashMap<_, _>`
  |
  = help: the trait `Fn<()>` is not implemented for `HashMap<_, _>`
  = note: wrap the `HashMap<_, _>` in a closure with no arguments: `|| { /* code */ }`

error[E0308]: mismatched types
  --> src/main.rs:7:9
   |
6  |       let result: HashMap<i32, i32> =
   |                   ----------------- expected due to this
7  | /         nums.par_iter()
8  | |             .filter(|x| *x % 2 == 0)
9  | |             .fold(HashMap::new(), |mut acc, x| {
10 | |                 *acc.entry(*x).or_insert(0) += 1;
11 | |                 acc
12 | |             });
   | |______________^ expected struct `HashMap`, found struct `Fold`
   |
   = note: expected struct `HashMap<i32, i32>`
              found struct `Fold<rayon::iter::Filter<rayon::slice::Iter<'_, {integer}>, [closure@src/main.rs:8:21: 8:36]>, HashMap<_, _>, _>`

显然,Rust 试图阻止我做一些涉及竞争条件的愚蠢行为,但我如何在 par_iter() 中构建一个 HashMap

Rayon 的折叠创建中间项目(无法知道有多少)。来自 documentation(强调我的):

Parallel fold is similar to sequential fold except that the sequence of items may be subdivided before it is folded. Consider a list of numbers like 22 3 77 89 46. If you used sequential fold to add them (fold(0, |a,b| a+b), you would wind up first adding 0 + 22, then 22 + 3, then 25 + 77, and so forth. The parallel fold works similarly except that it first breaks up your list into sublists, and hence instead of yielding up a single sum at the end, it yields up multiple sums. The number of results is nondeterministic, as is the point where the breaks occur.

你需要将那些中间项减少到最后一个:

use rayon::prelude::*; // 1.5.1
use std::collections::HashMap;

fn main() {
    let nums = [1, 2, 1, 2, 1, 2];
    let result: HashMap<i32, i32> = nums
        .par_iter()
        .filter(|x| *x % 2 == 0)
        .fold(HashMap::new, |mut acc, x| {
            *acc.entry(*x).or_insert(0) += 1;
            acc
        })
        .reduce_with(|mut m1, m2| {
            for (k, v) in m2 {
                *m1.entry(k).or_default() += v;
            }
            m1
        })
        .unwrap();

    println!("{:?}", result);
}

Playground

另请注意,Rayon 的 fold 的第一个参数是 创建空 HashMap 的函数,而不是像 HashMap 这样的空 HashMap标准库的 fold.