有可能制作更高效的多线程代码吗？（双 for 循环 O(n^2)）

Question

我使用最小编辑距离算法来查找相似字符串。

我的代码主题是在输入的数据中找到兴趣爱好最接近的情侣

Hobby type
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

// * All of the person have 10 hobby. 
// * Hobby can not be duplicated.

// Input Data format
// 1000.txt
1000                   // number of data
N D R P Y X V B M T    // each line is person's hobbies
P J Z E X W H S M N
F U G L C T Q S O P
A D Q S F E N P Y G
K P D F G Q U I V H
D E I S N L C K H G
D I R K J V Q H M Z
Y A R D F T N P W O
R A G Z F M J K I Y
P E K W H S F Z G R
U J O T I B R Y E H
M A Z N S J H X T P
...
...

// This is Couple Model
struct Couple {
  let firstIdx: Int
  let firstArray: String

  let secondIdx: Int
  let secondArray: String

  let value: Int
}


// This function finds the couple with the closest hobbies among the input data.
func findCouple() {
    guard
      // Utility.makeData's return value is purified Input data. ex) N D R P Y X V B M T -> BDMNPRTVYX
      let data = Utility.makeData(using: "1000") 
      let count = Int(data[0]) else { return }

    var couples = [Couple]()
    var min = data[1].count

    // make GCD Group for handle each queue.
    let group = DispatchGroup()  

    for i in 1..<count {
      // Pivot for hobby compare
      let hobby = data[i]

      // make custom queue for multi-threading
      let queue = DispatchQueue(label: "idx.\(i).queue", attributes: .concurrent)

      queue.async(group: group) {
        for j in (i + 1)..<data.count {
          // This is the subject of comparison.
          let hobby2 = data[j]

          // Calculate for find similarly string
          let newMin = hobby.minimumEditDistance(other: hobby2)

          queue.async(group: group, qos: .userInitiated, flags: .barrier) {
            // If find more similarly string bundle
            if min >= newMin {
              min = newMin
              // Store the couple
              couples.append(
                Couple(firstIdx: i, firstArray: hobby, secondIdx: j, secondArray: hobby2, value: min)
              )
            }
          }
        }
      }
    }

    group.notify(queue: DispatchQueue.global()) {
      let similarCouples = couples.filter({[=12=].value == min}
      // I want to print like
      // 1-3 : index of persons
      // ASDFZXCVNM : 1 persons's hobby
      // ASDFXCVBJK : 2 persons's hobby
    }
}

如果输入数据足够大（10000 或更多），我的函数性能最差。（非常慢）

如果有任何改进，请告诉我。

Answer 1

毫无疑问，您可以在最初的尝试中得到很大的提高。坦率地说，尽管看起来令人惊讶，但存在严重的风险，即您当前的再现可能甚至比非并行版本效率更低。对两者进行基准测试并进行比较。

关于问题，它们是多方面的：

来电越多 async 并不总是更好。肆无忌惮的 async 调用会带来各种低效率。首先，工作线程的数量非常有限（目前为 64 个）。其次，无论如何，超过设备上可用内核的数量没有任何用处。

而不是这些 async 电话，我们经常联系 concurrentPerform。这通常是并行化 for 循环的最有效方式。这将永远不会超过您的硬件允许的可用并发线程数。

因此，考虑非并行再现：
```
for i in 0 ..< 9_999 {
    for j in i+1 ..< 10_000 {
        ...
    }
}
```
我们简单地将其并行化：
```
DispatchQueue.concurrentPerform(iterations: 9_999) { i in
    for j in i+1 ..< 10_000 {
        ...
    }
}
```
而且，因为 concurrentPerform 阻塞了您调用它的线程，您可能会将整个事情分派到后台队列：
```
DispatchQueue.global().async {
    DispatchQueue.concurrentPerform(iterations: 9_999) { i in
        for j in i+1 ..< 10_000 {
            ...
        }
    }

    DispatchQueue.main.async {
        // now update UI with the results of the above
    }
}
```
注意，我没有在 concurrentPerform 中使用任何 async 调用，因为它为我们完成了外部 for 循环的所有并行化。无论如何，在这种模式下，我实际上进行了 10,000 次异步调用（不是其中的 50m）。 concurrentPerform 确保我在执行此操作时不会出现线程爆炸。
坦率地说，即使是 10,000 次迭代也太多了。每个线程的工作量不够，每个线程的开销会增加。事实上，并行化例程的好处将被所有这些开销不利地抵消。

典型的解决方案是“跨越”计算。例如，对于 10,000 次迭代，一个人可能会跨越，进行 200 次迭代，每个 i 的 50 个值，或类似的东西。例如：
```
let stride = 50
DispatchQueue.concurrentPerform(iterations: 200) { iteration in
    let startIndex = iteration * stride
    let endIndex = min(9_999, startIndex + stride)
    for i in startIndex ..< endIndex {
        for j in i+1 ..< strings.count {
            ...
        }
    }
}
```
因此，之前从 50m async 次调用减少到 concurrentPerform 的 10k 次迭代，我们现在减少到只有 200 次迭代。这足以实现巨大的并行化性能提升，而且开销可以忽略不计。
您的代码不是线程安全的。您正在从多个线程更新 min 和 couples。是的，你正在使用屏障，但每个循环都有自己的队列，屏障只针对当前队列，而不是跨队列。你有一场数据竞赛。您必须同步对这些变量的访问。

由于存在与同步相关的开销，我可能建议您更进一步，弄清楚如何最大限度地减少所需的同步。例如，每个分派的块可能会计算自己的本地最小距离值，然后在块的末尾，才应该检查本地最小距离与主最小距离。

这些是帮助您最大化并行计算性能的一些技巧。在我的 MacBookPro 上，上面的计算比非并行再现快 9.9 倍。

有可能制作更高效的多线程代码吗？（双 for 循环 O(n^2)）

Is possible make more efficiency multithreaded code? (double for loop O(n^2))

multithreading

for-loop

grand-central-dispatch

swift

有可能制作更高效的多线程代码吗？ （双 for 循环 O(n^2)）

Is possible make more efficiency multithreaded code? (double for loop O(n^2))

multithreading

for-loop

grand-central-dispatch

swift

有可能制作更高效的多线程代码吗？（双 for 循环 O(n^2)）