跨 goroutines 同步值（计数器）

Question

我有一个遍历网站页面的 golang 应用程序，应该在网站上每隔 link 下载一次。看起来有点像这样（我事先不知道页数，所以同步完成）：

page := 0
results := getPage(page)
c := make(chan *http.Response)
for len(results) > 0 {
  for result := range results {
    go myProxySwitcher.downloadChan(result.URL, c)
    fmt.Println(myProxySwitcher.counter)
  }
  page++
  results = getPage(page)
  myProxySwitcher.counter++
}

问题是，每 10 个请求，我想更改用于连接网站的代理。为此，我制作了一个带有计数器成员的结构：

type ProxySwitcher struct {
    proxies []string
    client  *http.Client
    counter int
}

然后每次从 downloadChan 发出请求时我都会增加计数器。

func (p *ProxySwitcher) downloadChan(url string, c chan *http.Response) {
    p.counter++
    proxy := p.proxies[int(p.counter/10)%len(p.proxies]
    res := p.client.Get(url, proxy)
    c <- res

}

当它进行下载时，不会出现计数器在 goroutine 之间同步的情况。 如何在 goroutine 之间同步计数器的值？

我从这些 printlns 得到的结果是：

而且我很期待

1
2
3
4
5
...

Answer 1

您的代码中存在竞争条件。

在第一个片段中，您正在修改“主”goroutine 中的 counter 字段：

  // ...
  myProxySwitcher.counter++

在第三个片段中，您还从另一个 goroutine 修改了该计数器：

  // ...
  p.counter++

这是 Go 中的非法代码。根据定义，结果是不确定的。要了解原因，您必须查看 Go Memory Model。提示：它可能不会很容易阅读。

要修复它，您需要确保同步。有很多方法可以做到。

正如对您的问题的评论中所建议的，一种方法是使用互斥锁。下面是一个例子，有点乱，因为它需要对主循环进行一些重构。但这是同步访问计数器的方式：

type ProxySwitcher struct {
  proxies []string
  client  *http.Client
    
  mu sync.Mutex
  counter int
}

func (p *ProxySwitcher) downloadChan(url string, c chan *http.Response) {
  p.mu.Lock()
  p.counter++
  // gotta read it from p while holding
  // the lock to use it below
  counter := p.counter
  p.mu.Unlock()

  // here you use counter rather than p.counter,
  // since you don't hold the lock anymore
  proxy := p.proxies[int(counter/10)%len(p.proxies)]
  res := p.client.Get(url, proxy)
  c <- res
}

// ... the loop ...
for len(results) > 0 {
  for result := range results {
    go myProxySwitcher.downloadChan(result.URL, c)
    
    // this is kinda messy, would need some heavier
    // refactoring, but this should fix the race:
    myProxySwitcher.mu.Lock()
    fmt.Println(myProxySwitcher.counter)
    myProxySwitcher.mu.Unlock()
  }
  page++
  results = getPage(page)

  // same... it's messy, needs refactoring
  myProxySwitcher.mu.Lock()
  myProxySwitcher.counter++
  myProxySwitcher.mu.Unlock()
}

或者，您可以将该计数器更改为例如uint64，然后使用atomic/sync包执行goroutine-safe操作：

type ProxySwitcher struct {
  proxies []string
  client  *http.Client
  counter uint64
}
func (p *ProxySwitcher) downloadChan(url string, c chan *http.Response) {
  counter := atomic.AddUint64(&p.counter, 1)

  // here you use counter rather than p.counter, since that's your local copy
  proxy := p.proxies[int(counter/10)%len(p.proxies)]
  res := p.client.Get(url, proxy)
  c <- res
}

// ... the loop ...
for len(results) > 0 {
  for result := range results {
    go myProxySwitcher.downloadChan(result.URL, c)
    counter := atomic.LoadUint64(&myProxySwitcher.counter)
    fmt.Println(counter)
  }
  page++
  results = getPage(page)

  atomic.AddUint64(&myProxySwitcher.counter, 1)
}

我可能会使用最后一个版本，因为它更干净而且我们真的不需要互斥量。

跨 goroutines 同步值（计数器）

Synchronize value (counter) across goroutines

channel

go

goroutine