什么时候 TCP 连接被认为是空闲的?

When is a TCP connection considered idle?

我需要在任何连接上启用 TCP keepalive,现在我正在为我们的测试用例的结果而苦苦挣扎。我认为这是因为我不太了解何时发送第一个 keepalive 探测。我在 Linux 上阅读了 tcp_keepalive_time 的文档中的以下内容:

the interval between the last data packet sent (simple ACKs are not considered data) and the first keepalive probe; after the connection is marked to need keepalive, this counter is not used any further

其他一些来源指出这是连接空闲的时间,但他们没有进一步定义这意味着什么。我还研究了史蒂文斯以找到对此的更正式定义,因为我想知道 "the last data packet sent" 在考虑重传时的实际含义。

在我的测试用例中,我有一个连接,其中数据仅以相当高的速率从服务器发送到客户端。为了测试 keepalive,我们拔掉了客户端 NIC 上的电缆。我现在可以看到网络堆栈尝试发送数据并进入重传状态,但没有发送任何保持活动探测。重传时不发送keep alive probe是否正确?

I have a connection where data is only sent from a server to a client at rather high rates.

那么你将永远看不到保活。当“线路上没有任何声音”时发送 Keepalive。 RFC1122 对 keepalive 有一些解释。

A "keep-alive" mechanism periodically probes the other end of a connection when the connection is otherwise idle, even when there is no data to be sent

回到你的问题:

Some other sources state that this is the time a connection is idle, but they do not further define what this means.

这是 TCP 在向对等方发出“嘿!还活着?”之前要等待的时间。

$ cat /proc/sys/net/ipv4/tcp_keepalive_time
7200

换句话说,您一直在使用 TCP 连接并且效果很好。但是,在过去的 2 小时内没有任何可发送的内容。假设连接仍然存在是否合理?假设中间的所有中间盒仍然具有关于您的连接的状态是否合理?意见不一,keepalive 不是 RFC793 的一部分。

The TCP specification does not include a keep-alive mechanism it could: (1) cause perfectly good connections to break during transient Internet failures; (2) consume unnecessary bandwidth ("if no one is using the connection, who cares if it is still good?")


To test keepalive, we unplugged the cable on the client's NIC.

这不是测试保活。这是在测试您的 TCP 重传策略,即 TCP 尝试传递您的消息的次数和频率。在 Linux 框中,这(可能)最终测试 net.ipv4.tcp_retries2:

How may times to retry before killing alive TCP connection. RFC 1122 says that the limit should be longer than 100 sec. It is too small number. Default value 15 corresponds to 13-30min depending on RTO.

但是 RFC5482 - TCP User Timeout Option 提供了更多影响它的方法。

The TCP user timeout controls how long transmitted data may remain unacknowledged before a connection is forcefully closed.

回到问题:

Is it correct that keep alive probes are not sent during retransmission

这是有道理的:TCP 已经在尝试从另一个对等点引出响应,一个空的 keepalive 是多余的。


Linux-特定 (2.4+) options to influence keepalive

  • TCP_KEEPCNT The maximum number of keepalive probes TCP should send before dropping the connection.

  • TCP_KEEPIDLE The time (in seconds) the connection needs to remain idle before TCP starts sending keepalive probes, if the socket option SO_KEEPALIVE has been set on this socket

  • TCP_KEEPINTVL The time (in seconds) between individual keepalive probes

Linux-特定 (2.6.37+) option to influence TCP User Timeout

TCP_USER_TIMEOUT The maximum amount of time in milliseconds that transmitted data may remain unacknowledged before TCP will forcibly close connection.

因此,例如,您的应用程序可以使用此选项来确定在没有连接时连接可以存活多长时间(类似于您的 NIC 拔出示例)。例如。如果您有理由相信客户会回来(也许他们合上了笔记本电脑的盖子?不稳定的无线访问?)您可以指定 12 小时的超时,当他们回来时连接仍然有效。