关闭 TcpListener 和 TcpClient 连接的正确顺序(哪一侧应该是主动关闭)

Proper order for closing TcpListener and TcpClient connections (which side should be the active close)

我读到 this answer on a previous question 上面写着:

So the peer that initiates the termination – i.e. calls close() first – will end up in the TIME_WAIT state. [...]

However, it can be a problem with lots of sockets in TIME_WAIT state on a server as it could eventually prevent new connections from being accepted. [...]

Instead, design your application protocol so the connection termination is always initiated from the client side. If the client always knows when it has read all remaining data it can initiate the termination sequence. As an example, a browser knows from the Content-Length HTTP header when it has read all data and can initiate the close. (I know that in HTTP 1.1 it will keep it open for a while for a possible reuse, and then close it.)

我想使用 TcpClient/TcpListener 来实现它,但不清楚如何使其正常工作。

方法一:双方关闭

这是大多数 MSDN 示例说明的典型方式 - 双方都调用 Close(),而不仅仅是客户端:

private static void AcceptLoop()
{
    listener.BeginAcceptTcpClient(ar =>
    {
        var tcpClient = listener.EndAcceptTcpClient(ar);

        ThreadPool.QueueUserWorkItem(delegate
        {
            var stream = tcpClient.GetStream();
            ReadSomeData(stream);
            WriteSomeData(stream);
            tcpClient.Close();   <---- note
        });

        AcceptLoop();
    }, null);
}

private static void ExecuteClient()
{
    using (var client = new TcpClient())
    {
        client.Connect("localhost", 8012);

        using (var stream = client.GetStream())
        {
            WriteSomeData(stream);
            ReadSomeData(stream);
        }
    }
}

在我 运行 这有 20 个客户端后,TCPView 显示来自 客户端和服务器的大量套接字 卡在 TIME_WAIT ,需要相当长的时间才能消失。

方法二:仅客户端关闭

根据上面的引述,我删除了对我的侦听器的 Close() 调用,现在我只依赖客户端关闭:

var tcpClient = listener.EndAcceptTcpClient(ar);

ThreadPool.QueueUserWorkItem(delegate
{
    var stream = tcpClient.GetStream();
    ReadSomeData(stream);
    WriteSomeData(stream);
    // tcpClient.Close();   <-- Let the client close
});

AcceptLoop();

现在我不再有任何 TIME_WAIT,但我确实在 CLOSE_WAITFIN_WAIT 等不同阶段留下了插座,这些插座也需要很长时间才能消失.

方法 3:先给客户时间关闭

这次我在关闭服务器连接之前添加了一个延迟:

var tcpClient = listener.EndAcceptTcpClient(ar);

ThreadPool.QueueUserWorkItem(delegate
{
    var stream = tcpClient.GetStream();
    ReadSomeData(stream);
    WriteSomeData(stream);
    Thread.Sleep(100);      // <-- Give the client the opportunity to close first
    tcpClient.Close();      // <-- Now server closes
});

AcceptLoop();

这似乎更好 - 现在 TIME_WAIT 中只有客户端套接字;服务器套接字已全部正确关闭:

这似乎与之前链接的文章所说的一致:

So the peer that initiates the termination – i.e. calls close() first – will end up in the TIME_WAIT state.

问题:

  1. 这些方法中哪一个是正确的方法,为什么? (假设我希望客户端是 'active close' 方)
  2. 是否有更好的方法来实现方法 3?我们希望关闭由客户端发起(这样客户端就剩下TIME_WAITs),但是当客户端关闭时,我们也希望关闭服务器上的连接。
  3. 我的场景其实和web服务器是相反的;我有一个客户端连接许多不同的远程机器并断开连接。我宁愿服务器将连接卡在 TIME_WAIT 中,以释放我的客户端上的资源。在这种情况下,我是否应该让服务器执行 active close,并将 sleep/close 放在我的客户端上?

您可以自己尝试的完整代码在这里:

https://gist.github.com/PaulStovell/a58cd48a5c6b14885cf3

编辑:另一个有用的资源:

http://www.serverframework.com/asynchronousevents/2011/01/time-wait-and-its-design-implications-for-protocols-and-scalable-servers.html

For a server that does establish outbound connections as well as accepting inbound connections then the golden rule is to always ensure that if a TIME_WAIT needs to occur that it ends up on the other peer and not the server. The best way to do this is to never initiate an active close from the server, no matter what the reason. If your peer times out, abort the connection with an RST rather than closing it. If your peer sends invalid data, abort the connection, etc. The idea being that if your server never initiates an active close it can never accumulate TIME_WAIT sockets and therefore will never suffer from the scalability problems that they cause. Although it's easy to see how you can abort connections when error situations occur what about normal connection termination? Ideally you should design into your protocol a way for the server to tell the client that it should disconnect, rather than simply having the server instigate an active close. So if the server needs to terminate a connection the server sends an application level "we're done" message which the client takes as a reason to close the connection. If the client fails to close the connection in a reasonable time then the server aborts the connection.

On the client things are slightly more complicated, after all, someone has to initiate an active close to terminate a TCP connection cleanly, and if it's the client then that's where the TIME_WAIT will end up. However, having the TIME_WAIT end up on the client has several advantages. Firstly if, for some reason, the client ends up with connectivity issues due to the accumulation of sockets in TIME_WAIT it's just one client. Other clients will not be affected. Secondly, it's inefficient to rapidly open and close TCP connections to the same server so it makes sense beyond the issue of TIME_WAIT to try and maintain connections for longer periods of time rather than shorter periods of time. Don't design a protocol whereby a client connects to the server every minute and does so by opening a new connection. Instead use a persistent connection design and only reconnect when the connection fails, if intermediary routers refuse to keep the connection open without data flow then you could either implement an application level ping, use TCP keep alive or just accept that the router is resetting your connection; the good thing being that you're not accumulating TIME_WAIT sockets. If the work that you do on a connection is naturally short lived then consider some form of "connection pooling" design whereby the connection is kept open and reused. Finally, if you absolutely must open and close connections rapidly from a client to the same server then perhaps you could design an application level shutdown sequence that you can use and then follow this with an abortive close. Your client could send an "I'm done" message, your server could then send a "goodbye" message and the client could then abort the connection.

这就是 TCP 的工作原理,您无法避免。您可以在服务器上为 TIME_WAIT 或 FIN_WAIT 设置不同的超时,仅此而已。

原因是在 TCP 上,数据包可以到达您很久以前关闭的套接字。如果您已经在同一 IP 和端口上打开了另一个套接字,它将接收到前一个会话的数据,这会使它感到困惑。特别是考虑到大多数人认为 TCP 是可靠的 :)

如果您的客户端和服务器都正确地实现了 TCP(例如,正确处理干净关闭),则客户端或服务器是否关闭连接并不重要。既然听起来你管理双方,那应该不是问题。

您的问题似乎与服务器的正常关闭有关。当套接字的一侧关闭时,另一侧将 Read 长度为 0 - 这是您的消息,表明通信已结束。您很可能在服务器代码中忽略了这一点 - 这是一个特殊情况,表示 "you can now safely dispose of this socket, do it now".

对于您的情况,服务器端关闭似乎是最合适的。

但实际上,TCP 相当复杂。互联网上的大多数示例都存在严重缺陷(尤其是 C# 示例 - 例如,为 C++ 找到一个好的示例并不难)并且忽略了协议的许多重要部分,这无济于事。我有一个可能对您有用的简单示例 - https://github.com/Luaancz/Networking/tree/master/Networking%20Part%201 它仍然不是完美的 TCP,但比 MSDN 示例要好得多。

Which of these approaches is the right way to go, and why? (Assuming I want the client to be the 'active close' side)

在理想情况下,您会让服务器向客户端发送一个带有特定操作码的 RequestDisconnect 数据包,然后客户端在收到该数据包后通过关闭连接来处理该数据包。这样,您就不会在服务器端遇到陈旧的套接字(因为套接字是资源,而资源是有限的,所以陈旧是一件坏事)。

如果客户端随后通过处理套接字(或如果您使用 TcpClient 调用 Close() 来执行其断开序列,它将使套接字处于 CLOSE_WAIT 状态在服务器上,这意味着连接正在关闭。

Is there a better way to implement approach 3? We want the close to be initiated by the client (so that the client is left with the TIME_WAITs), but when the client closes, we also want to close the connection on the server.

是的。同上,让服务器发送一个数据包,请求客户端关闭与服务器的连接。

My scenario is actually opposite to a web server; I have a single client that connects and disconnects from many different remote machines. I'd rather the server have connections stuck in TIME_WAIT instead, to free up resources on my client. In this case, should I make the server perform the active close, and put the sleep/close on my client?

是的,如果这就是您想要的,请随时在服务器上调用 Dispose 以摆脱客户端。

附带说明一下,您可能想研究使用原始 Socket 对象而不是 TcpClient,因为它非常有限。如果直接在套接字上操作,则可以使用 SendAsync 和所有其他用于套接字操作的异步方法。我会避免使用手动 Thread.Sleep 调用 - 这些操作本质上是异步的 - 在你向流写入内容后断开连接应该在 SendAsync 的回调中完成,而不是在 Sleep.

保罗,你自己做了一些很棒的研究。我也一直在这个领域工作。我发现一篇关于 TIME_WAIT 主题非常有用的文章是:

http://vincent.bernat.im/en/blog/2014-tcp-time-wait-state-linux.html

它有一些 Linux 特定问题,但所有 TCP 级别的内容都是通用的。

最终双方都应该关闭(即完成 FIN/ACK 握手),因为您不希望 FIN_WAIT 或 CLOSE_WAIT 状态挥之不去,这只是 "bad" TCP .我会避免使用 RST 强制关闭连接,因为这可能会在其他地方引起问题,而且感觉就像一个可怜的网民。

TIME_WAIT 状态确实会发生在先终止连接的一端(即发送第一个 FIN 数据包),你应该优化以在将要有的一端先关闭连接最少的连接流失。

在 Windows 上,默认情况下每个 IP 将有超过 15,000 个 TCP 端口可用,因此您需要适当的连接流失率才能实现这一点。 TCB 跟踪 TIME_WAIT 状态的内存应该是完全可以接受的。

https://support.microsoft.com/kb/929851

同样重要的是要注意 TCP 连接可以半关闭。也就是说,一端可以选择关闭连接以进行发送,但保持打开状态以进行接收。在 .NET 中,这是这样完成的:

tcpClient.Client.Shutdown(SocketShutdown.Send);

http://msdn.microsoft.com/en-us/library/system.net.sockets.socket.shutdown.aspx

在将部分 netcat 工具从 Linux 移植到 PowerShell 时,我发现这是必要的:

http://www.powershellmagazine.com/2014/10/03/building-netcat-with-powershell/

我必须重申以下建议:如果您可以保持连接打开和空闲直到您再次需要它,这通常会对减少 TIME_WAITs 产生巨大影响。

除此之外,尝试在 TIME_WAIT 成为问题时进行测量......确实需要 很多 的连接流失才能耗尽 TCP 资源。

希望以上内容对您有所帮助。