异常大量的 TCP 连接超时错误

Unusually High Amount of TCP Connection Timeout Errors

我正在使用 Go TCP 客户端连接到我们的 Go TCP 服务器。

我能够正确连接到服务器和 运行 命令,但是在尝试连接到我们的服务器时,我的 TCP 客户端经常会报告异常大量的连续 TCP 连接错误TCP 服务器或连接后发送消息:

dial tcp kubernetes_node_ip:exposed_kubernetes_port:
connectex: A connection attempt failed because the connected party did not properly
respond after a period of time, or established connection failed because connected
host has failed to respond.

read tcp unfamiliar_ip:unfamiliar_port->kubernetes_node_ip:exposed_kubernetes_port
wsarecv: A connection attempt failed because the connected party did not properly
respond after a period of time, or established connection failed because connected
host has failed to respond.

我说 "unusually high" 是因为我假设这些错误发生的次数应该非常少(一小时内大约 5 次或更少)。请注意,我并没有排除这种情况是由连接不稳定引起的可能性,因为我还注意到可以 运行 快速连续地执行多个命令而不会出现任何错误。

但是,我仍然会post我的代码以防我做错了什么。

下面是我的 TCP 客户端用来连接到我们的服务器的代码:

serverAddress, err := net.ResolveTCPAddr("tcp", kubernetes_ip+":"+kubernetes_port)
if err != nil {     
    fmt.Println(err)
    return
}

// Never stop asking for commands from the user.
for {
    // Connect to the server.
    serverConnection, err := net.DialTCP("tcp", nil, serverAddress)
    if err != nil {         
        fmt.Println(err)
        continue
    }

    defer serverConnection.Close()

    // Added to prevent connection timeout errors, but doesn't seem to be helping
    // because said errors happen within just 1 or 2 minutes.
    err = serverConnection.SetDeadline(time.Now().Add(10 * time.Minute))
    if err != nil {         
        fmt.Println(err)
        continue
    }

    // Ask for a command from the user and convert to JSON bytes...

    // Send message to server.
    _, err = serverConnection.Write(clientMsgBytes)
    if err != nil {
        err = merry.Wrap(err)
        fmt.Println(merry.Details(err))
        continue
    }

    err = serverConnection.CloseWrite()
    if err != nil {
        err = merry.Wrap(err)
        fmt.Println(merry.Details(err))
        continue
    }

    // Wait for a response from the server and print...
}

下面是我们的 TCP 服务器用来接受客户端请求的代码:

// We only supply the port so the IP can be dynamically assigned:
serverAddress, err := net.ResolveTCPAddr("tcp", ":"+server_port)
if err != nil {     
    return err
}

tcpListener, err := net.ListenTCP("tcp", serverAddress)
if err != nil {     
    return err
}

defer tcpListener.Close()

// Never stop listening for client requests.
for {
    clientConnection, err := tcpListener.AcceptTCP()
    if err != nil {         
        fmt.Println(err)
        continue
    }

    go func() {
        // Add client connection to Job Queue.
        // Note that `clientConnections` is a buffered channel with a size of 1500.
        // Since I am the only user connecting to our server right now, I do not think
        // this is a channel blocking issue.
        clientConnections <- clientConnection
    }()
}

下面是我们的TCP服务器用来处理客户端请求的代码:

defer clientConnection.Close()

// Added to prevent connection timeout errors, but doesn't seem to be helping
// because said errors happen within just 1 or 2 minutes.
err := clientConnection.SetDeadline(time.Now().Add(10 * time.Minute))
if err != nil {     
    return err
}

// Read full TCP message.
// Does not stop until an EOF is reported by `CloseWrite()`
clientMsgBytes, err := ioutil.ReadAll(clientConnection)
if err != nil {
    err = merry.Wrap(err)
    return nil, err
}

// Process the message bytes...

我的问题是:

  1. 我是不是在上面的代码中做错了什么,或者上面的代码对于基本的 TCP 客户端-服务器操作是否足够好?

  2. TCP 客户端和 TCP 服务器都有延迟关闭它们一个连接的代码可以吗?

  3. 我似乎记得在循环中调用 defer 没有任何作用。如何在开始新连接之前正确关闭客户端连接?

一些额外信息:

看来这段代码并没有像您想象的那样起作用。连接关闭上的 defer 语句只会在函数 returns 时发生,而不是在迭代结束时发生。据我所见,您在客户端创建了很多连接,这可能是问题所在。

serverAddress, err := net.ResolveTCPAddr("tcp", kubernetes_ip+":"+kubernetes_port)
if err != nil {     
    fmt.Println(err)
    return
}

// Never stop asking for commands from the user.
for {
    // Connect to the server.
    serverConnection, err := net.DialTCP("tcp", nil, serverAddress)
    if err != nil {         
        fmt.Println(err)
        continue
    }

    defer serverConnection.Close()

    // Added to prevent connection timeout errors, but doesn't seem to be helping
    // because said errors happen within just 1 or 2 minutes.
    err = serverConnection.SetDeadline(time.Now().Add(10 * time.Minute))
    if err != nil {         
        fmt.Println(err)
        continue
    }

    // Ask for a command from the user and send to the server...

    // Wait for a response from the server and print...
}

我建议这样写:

func start() {
    serverAddress, err := net.ResolveTCPAddr("tcp", kubernetes_ip+":"+kubernetes_port)
    if err != nil {     
        fmt.Println(err)
        return
    }
    for {
        if err := listen(serverAddress); err != nil {
            fmt.Println(err)
        }
    }
}

func listen(serverAddress string) error {
     // Connect to the server.
     serverConnection, err := net.DialTCP("tcp", nil, serverAddress)
     if err != nil {         
         fmt.Println(err)
         continue
     }

    defer serverConnection.Close()

    // Never stop asking for commands from the user.
    for {
        // Added to prevent connection timeout errors, but doesn't seem to be helping
        // because said errors happen within just 1 or 2 minutes.
        err = serverConnection.SetDeadline(time.Now().Add(10 * time.Minute))
        if err != nil {         
           fmt.Println(err)
           return err
        }

        // Ask for a command from the user and send to the server...

        // Wait for a response from the server and print...
    }
}

此外,您应该保持单个连接或连接池处于打开状态,而不是立即打开和关闭连接。然后,当您发送一条消息时,您会从池中获得一个连接(或单个连接),然后编写消息并等待响应,然后释放与池的连接。

类似的东西:

res, err := c.Send([]byte(`my message`))
if err != nil {
    // handle err
}

// the implementation of send
func (c *Client) Send(msg []byte) ([]byte, error) {
    conn, err := c.pool.Get() // returns a connection from the pool or starts a new one
    if err != nil {
        return nil, err
    }
    // send your message and wait for response
    // ...
    return response, nil
}