将有效区域设置为 PSN 范围大小的一半背后的逻辑是什么?

What is the logic behind setting valid region half the size of PSN range?

In order to make it possible for the responder to distinguish duplicate packets from out of order packets, a given send queue shall have a series of PSNs no greater than 8,388,608 outstanding at any given time. Therefore, a send queue shall have no more than 8,388,608 packets outstanding at any given time. This includes the sum of all SEND request packets plus all RDMA WRITE request packets plus all ATOMIC Operation request packets plus all expected RDMA READ response packets. Thus, the PSN space (consisting of a range of 16,777,216 PSNs) is divided into two regions, each occupying a range of 8,388,608 PSNs, called the valid region and the invalid region.

正如我从 IBTA 规范中引用的那样,如果有效区域大于 2^24 大小的 PSN 区域大小的一半,为什么无法区分重复数据包和乱序数据包?

假设整个 PSN 范围较小,为简化示例,假设 0..3。如果我们遵循规范的精神,有效区域将是 2 个数据包,其中将包括预期的 PSN 和 1 个先前的重复 PSN,但假设我们将其增加到 3 个数据包。

看看下面的两个场景:

乱序场景

Sender sends  | Receiver sends
Send 0        | Ack 0
Send 1 (lost) |
Send 2 (lost) |
Send 3        | ?

接收方收到Send 0后,预期的PSN为1。当接收方收到第4个数据包时,它是一个out-of-order数据包,比预期的PSN先进2。响应方应将此视为序列错误。

场景重复

Sender sends     | Receiver receives | Receiver sends
Send 3           | Send 3            | Ack 3 (lost)
Send 3 (delayed) |                   |
Send 0           | Send 0            | Ack 0
                 | Send 3 (delayed)  | ?

此处发送方在等待丢失的 ack 后重传 Send 3 times-out。重传在网络中有延迟,接收方收到Send 0后才看到。接收方期望的PSN为1,正在接收有效区域内的数据包(落后2个数据包),因此应该将其视为作为重复数据包。

总结

如您所见,在这两种情况下,接收方状态(预期 PSN)相同,并且接收到的数据包具有相同的 PSN,因此有效区域为 3,无法区分这两种情况。如果我们将有效区域限制为 2,第一种情况就不可能发生,因为发送方在发送 PSN 3 之前必须等待 PSN 1 的确认。