Goroutine长时间处于IO等待状态

Goroutine in IO wait state for long time

我有一台go1.7的大流量服务器(超过800K qps)。

来自 http://urltoserver:debugport/debug/pprof/goroutine?debug=2 我看到 8K goroutines,其中将近 1800 个在 IO 等待分钟。这样的 goroutine 堆栈之一如下所示。

    goroutine 128328653 [IO wait, 54 minutes]:
    net.runtime_pollWait(0x7f0fcc60c378, 0x72, 0x7cb)
      /usr/local/go/src/runtime/netpoll.go:160 +0x59
    net.(*pollDesc).wait(0xc4231d0a00, 0x72, 0xc42479fa20, 0xc42000c048)
      /usr/local/go/src/net/fd_poll_runtime.go:73 +0x38
    net.(*pollDesc).waitRead(0xc4231d0a00, 0x92f200, 0xc42000c048)
      /usr/local/go/src/net/fd_poll_runtime.go:78 +0x34
    net.(*netFD).Read(0xc4231d09a0, 0xc423109000, 0x1000, 0x1000, 0x0, 0x92f200, 0xc42000c048)
      /usr/local/go/src/net/fd_unix.go:243 +0x1a1
    net.(*conn).Read(0xc4234282b8, 0xc423109000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
      /usr/local/go/src/net/net.go:173 +0x70
    net/http.(*connReader).Read(0xc420449840, 0xc423109000, 0x1000, 0x1000, 0xc422b38b68, 0x100000000, 0xc421810601)
      /usr/local/go/src/net/http/server.go:586 +0x144
    bufio.(*Reader).fill(0xc422e22360)
      /usr/local/go/src/bufio/bufio.go:97 +0x10c
    bufio.(*Reader).Peek(0xc422e22360, 0x4, 0x7a066c, 0x4, 0x1, 0x0, 0x0)
      /usr/local/go/src/bufio/bufio.go:129 +0x62
    net/http.(*conn).readRequest(0xc422b38b00, 0x931fc0, 0xc424d19440, 0x0, 0x0, 0x0)
      /usr/local/go/src/net/http/server.go:762 +0xdff
    net/http.(*conn).serve(0xc422b38b00, 0x931fc0, 0xc424d19440)
      /usr/local/go/src/net/http/server.go:1532 +0x3d3
    created by net/http.(*Server).Serve
      /usr/local/go/src/net/http/server.go:2293 +0x44d

有人遇到过这个问题吗? 任何指针表示赞赏。

这些很可能是发起请求但从未完成请求的客户端,或者缓慢的客户端等。

您应该配置服务器的 Read/Write 超时(分别为 server.ReadTimeout and server.WriteTimeout):

s := new(http.Server)
// ...
s.ReadTimeout = 5 * time.Second
s.WriteTimeout = 5 * time.Second
// ...