在 Go 中组合存储在通道上的多个地图（相同键的值求和）

Question

我的 objective 是创建一个程序，以并行方式计算文本文件中每个唯一单词的出现次数，所有出现的次数都必须在一张图中显示。

我在这里所做的是将文本文件分成字符串，然后再分成数组。然后将该数组分成两个长度相等的切片，并同时馈送到映射器函数。

   func WordCount(text string)  (map[string]int) {
    wg := new(sync.WaitGroup)
    s := strings.Fields(newText)

    freq := make(map[string]int,len(s))
    channel := make(chan map[string]int,2)

    wg.Add(1)
    go mappers(s[0:(len(s)/2)], freq, channel,wg)
    wg.Add(1)
    go mappers(s[(len(s)/2):], freq, channel,wg)
    wg.Wait()

    actualMap := <-channel


    return actualMap

func mappers(slice []string, occurrences map[string]int, ch chan map[string]int, wg *sync.WaitGroup)  {
    var l = sync.Mutex{}
    for _, word := range slice {
        l.Lock()
        occurrences[word]++
        l.Unlock()

    }
    ch <- occurrences
    wg.Done()
}

最重要的是，我收到一个以

开头的巨大多行错误

fatal error: concurrent map writes

当我运行代码。我以为我通过互斥来保护

        l.Lock()
        occurrences[word]++
        l.Unlock()

我在这里做错了什么？而且。如何将所有地图组合到一个频道中？对于组合，我的意思是相同键的值在新地图中求和。

Answer 1

主要问题是您在每个 goroutine 中使用了单独的锁。这对序列化对地图的访问没有任何帮助。每个 goroutine 必须使用相同的锁。

而且由于您在每个 goroutine 中使用相同的映射，因此您不必合并它们，也不需要通道来传递结果。

即使您在每个 goroutine 中使用相同的互斥锁，因为您使用单个 map，这可能对性能没有帮助，goroutine 将不得不相互竞争 map 的锁。

你应该在每个goroutine中创建一个单独的地图，用它来本地计数，然后将结果地图传递到频道上。这可能会给您带来性能提升。

但是你不需要锁，因为每个 goroutine 都有自己的映射，它可以 read/write 没有互斥锁。

但是你必须在频道上传递结果，然后合并它。

并且由于 goroutines 在通道上传递结果，等待组就变得不必要了。

func WordCount(text string) map[string]int {
    s := strings.Fields(text)

    channel := make(chan map[string]int, 2)

    go mappers(s[0:(len(s)/2)], channel)
    go mappers(s[(len(s)/2):], channel)

    total := map[string]int{}
    for i := 0; i < 2; i++ {
        m := <-channel
        for k, v := range m {
            total[k] += v
        }
    }

    return total
}

func mappers(slice []string, ch chan map[string]int) {
    occurrences := map[string]int{}
    for _, word := range slice {
        occurrences[word]++

    }
    ch <- occurrences
}

测试示例：

fmt.Println(WordCount("aa ab cd cd de ef a x cd aa"))

输出（在 Go Playground 上尝试）：

map[a:1 aa:2 ab:1 cd:3 de:1 ef:1 x:1]

另请注意，理论上这看起来 "good"，但实际上您可能仍然无法实现任何性能提升，因为 goroutines 也在 "little" 工作，启动它们并合并结果需要努力可能会超过收益。

在 Go 中组合存储在通道上的多个地图（相同键的值求和）

Combining multiple maps that are stored on channel (Same key's values get summed.) in Go

string

concurrency

dictionary

channel

go