我想并行化我的 Clojure 实现

I would like to Parallelize my Clojure implementation

好的,所以我有一个算法,它的作用是逐行循环填充,然后在该行中查找给定的单词。它不仅 return 给定的词,而且 return 是该词前后出现的词的数字(也作为参数给出)。

Eg.line = "I am overflowing with blessings and you also are"
           parameters = ("you" 2)
           output = (blessings and you also are)

(with-open [r (clojure.java.io/reader "resources/small.txt")]
  (doseq [l (line-seq r)]
    (let [x (topMostLoop l "good" 2)]
      (if (not (empty? x))
        (println x)))))

以上代码运行良好。但我想将它并行化,所以我在下面做了这个

(with-open [r (clojure.java.io/reader "resources/small.txt")]
  (doseq [l (line-seq r)]
    (future
      (let [x (topMostLoop l "good" 2)]
        (if (not (empty? x))
          (println x))))))

但随后输出变得一团糟。我知道我需要锁在某个地方,但不知道在哪里。

(defn topMostLoop [contents word next]
  (let [mywords (str/split contents #"[ ,\.]+")]
    (map (fn [element] (
                        return-lines (max 0 (- element next))
                        (min (+ element next) (- (count mywords) 1)) mywords))
         (vec ((indexHashMap mywords) word)))))

如果有人能帮助我,我会很高兴这是我剩下的最后一件事。

注意。如果我还需要 post 其他功能,请告诉我

为了更清楚起见,我添加了其他函数

(defn return-lines [firstItem lastItem contentArray]
  (take (+ (- lastItem firstItem) 1) 
        (map (fn [element] (str element))
             (vec (drop firstItem contentArray)))))

(defn indexHashMap [mywords]
  (->> (zipmap (range) mywords)     ;contents is a list of words
       (reduce (fn [index [location word]]
                 (merge-with concat index {word (list location)})) {})))

首先,在使用串行方法时使用 map 作为第一个示例:

(with-open [r (clojure.java.io/reader "resources/small.txt")]
  (doseq [l (map #(topMostLoop %1 "good" 2) (line-seq r))]
    (if (not (empty? l))
        (println l))))

使用这种方法 topMostLoop 函数应用于每一行,并返回结果的惰性序列。在 doseq 函数体中,如果不为空,则打印结果。

之后,将 map 替换为 pmap,这将 运行 并行映射,结果将按给定行的相同顺序出现:

(with-open [r (clojure.java.io/reader "resources/small.txt")]
  (doseq [l (pmap #(topMostLoop %1 "good" 2) (line-seq r))]
    (if (not (empty? l))
        (println l))))

在您使用期货的情况下,结果通常会乱序(一些较晚的期货将比以前的期货更快地完成执行)。

我通过以下修改对此进行了测试(不是读取文本文件,而是创建数字向量的惰性序列,在向量中搜索值并返回周围):

(def lines (repeatedly #(shuffle (range 1 11))))
(def lines-10 (take 10 lines))

lines-10
([5 8 3 10 6 9 7 2 1 4]
[6 8 9 7 2 5 10 4 1 3]
[2 7 8 9 1 5 10 3 4 6]
[10 8 3 5 7 2 4 9 6 1]
[8 6 10 1 9 4 3 7 2 5]
[9 6 8 1 5 10 3 4 2 7]
[10 9 3 7 1 8 4 6 5 2]
[6 1 4 10 3 7 8 9 5 2]
[9 6 7 5 8 3 10 4 2 1]
[4 1 5 2 7 3 6 9 8 10])

(defn surrounding
 [v value size]
  (let [i (.indexOf v value)]
   (if (= i -1)
    nil
    (subvec v (max (- i size) 0) (inc (min (+ i size) (dec (count v))))))))

(doseq [l (map #(surrounding % 3 2) lines-10)] (if (not (empty? l)) (println l)))
[5 8 3 10 6]
[4 1 3]
[5 10 3 4 6]
[10 8 3 5 7]
[9 4 3 7 2]
[5 10 3 4 2]
[10 9 3 7 1]
[4 10 3 7 8]
[5 8 3 10 4]
[2 7 3 6 9]
nil

(doseq [l (pmap #(surrounding % 3 2) lines-10)] (if (not (empty? l)) (println l)))
[5 8 3 10 6]
[4 1 3]
[5 10 3 4 6]
[10 8 3 5 7]
[9 4 3 7 2]
[5 10 3 4 2]
[10 9 3 7 1]
[4 10 3 7 8]
[5 8 3 10 4]
[2 7 3 6 9]
nil