什么可以打开 TCP Keep-Alive 标志?我的流量被破坏了吗?

What could turn on the TCP Keep-Alive flag? Is my traffic being corrupted?

背景

我正在对我的客户端-服务器应用程序进行压力测试。两端都是通过 epoll.

进行事件检测的 C++ 程序

在此测试中,它们各自 运行 在 Oracle VirtualBox 5.0.22 实例中的 CentOS 7 上,通过 VirtualBox 的 Host-Only 以太网适配器(类型:Intel PRO/1000 进行通信MT 桌面 (82504EM)).

客户端打开与服务器的 TCP/IP 连接,交换一些应用程序级握手消息,并通过每十秒发送一个 ASCII 20(空白)来维护它。称其为 "ping"。在任何一方错过一定数量的预期 "pings" 后,连接将关闭。

在某些情况下,服务器还可以打开与客户端的连接以更快地重新建立通信(例如,在服务器重新启动后)。在大多数配置中,客户端实际上最终也会重新打开自己的传出连接,并且服务器的连接将关闭为 "redundant".

这在小范围内工作正常,但是当我尝试模拟网络上有很多客户端时,things fall apart。由于服务器需要每个客户端都在不同的 IP 上,为了模拟,我在 192.168.21.0/24 中创建了一些 "virtual interfaces",并使用路由。

假设我正在模拟 20 个客户端。要设置第 12 个,我将在我的客户端 VM 上执行此操作:

ip link add link enp0s8 sbsim12 type macvlan
ip link set up dev sbsim12
ip addr add 192.168.21.12/24 broadcast 192.168.21.255 dev sbsim12

(enp0s8 是 VirtualBox Host-Only 适配器)

然后,在服务器虚拟机上:

ip route add 192.168.21.0/24 dev enp0s8

然后我的客户端的一个实例可以绑定到 192.168.21.12,此后,在我的系统中,这似乎是它的 IP。

问题

当我们的应用程序使用 UDP 通信时,这种机制对我们来说效果很好。它在小规模下也能正常工作。但是,当我一次启动越来越多的客户时,我开始看到奇怪的行为。症状各不相同,但一般模式似乎是 TCP/IP 连接停止。在我的应用程序中有大量调试输出,我可以看到发送方正确检测到套接字上的 EPOLLOUTsending 到它,但接收方偶尔从未检测到 EPOLLIN 所以数据实际上丢失了。这种情况每隔几次运行就会发生一次,随着客户端数量的增加,这种可能性也会增加。

我花了十年的时间对我的应用程序逻辑的正确性进行取证分析,我开始怀疑我是否在较低层遇到了某种网络错误,无论是在 MAC VLAN 领域或 VirtualBox 驱动领域。

为了排除这种可能性,我需要比我更了解 TCP 的人来确认或否认以下内容确实很奇怪。

这个数据包流到底发生了什么?

No.     Time           Source                Destination           Protocol Info
  26496 581.345275     192.168.21.51         192.168.99.100        TCP      42551→cisco-sccp(2000) [SYN] Seq=0 Win=29200 Len=0 MSS=1460 SACK_PERM=1 TSval=377482702 TSecr=0 WS=128
  26499 581.345711     192.168.99.100        192.168.21.51         TCP      cisco-sccp(2000)→42551 [SYN, ACK] Seq=0 Ack=1 Win=28960 Len=0 MSS=1460 SACK_PERM=1 TSval=381905815 TSecr=377482702 WS=128
  26500 581.345936     192.168.21.51         192.168.99.100        TCP      42551→cisco-sccp(2000) [ACK] Seq=1 Ack=1 Win=29312 Len=0 TSval=377482703 TSecr=381905815
  26516 581.349421     192.168.99.100        192.168.21.51         TCP      cisco-sccp(2000)→42551 [PSH, ACK] Seq=1 Ack=1 Win=29312 Len=131 TSval=381905865 TSecr=377482703
  26519 581.349661     192.168.21.51         192.168.99.100        TCP      42551→cisco-sccp(2000) [ACK] Seq=1 Ack=132 Win=30336 Len=0 TSval=377482706 TSecr=381905865
  26647 581.394528     192.168.21.51         192.168.99.100        TCP      42551→cisco-sccp(2000) [PSH, ACK] Seq=1 Ack=132 Win=30336 Len=131 TSval=377482751 TSecr=381905865
  26648 581.394574     192.168.99.100        192.168.21.51         TCP      cisco-sccp(2000)→42551 [ACK] Seq=132 Ack=132 Win=30336 Len=0 TSval=381905911 TSecr=377482751
  26690 581.401738     192.168.21.51         192.168.99.100        TCP      42551→cisco-sccp(2000) [PSH, ACK] Seq=132 Ack=132 Win=30336 Len=289 TSval=377482758 TSecr=381905911
  26691 581.401756     192.168.99.100        192.168.21.51         TCP      cisco-sccp(2000)→42551 [ACK] Seq=132 Ack=421 Win=31360 Len=0 TSval=381905918 TSecr=377482758
  26735 581.418696     192.168.99.100        192.168.21.51         TCP      cisco-sccp(2000)→42551 [PSH, ACK] Seq=132 Ack=421 Win=31360 Len=48 TSval=381905935 TSecr=377482758
  26737 581.418927     192.168.21.51         192.168.99.100        TCP      42551→cisco-sccp(2000) [ACK] Seq=421 Ack=180 Win=30336 Len=0 TSval=377482776 TSecr=381905935
  26749 581.432843     192.168.99.100        192.168.21.51         TCP      cisco-sccp(2000)→42551 [PSH, ACK] Seq=180 Ack=421 Win=31360 Len=45 TSval=381905949 TSecr=377482776
  26751 581.433022     192.168.21.51         192.168.99.100        TCP      42551→cisco-sccp(2000) [ACK] Seq=421 Ack=225 Win=30336 Len=0 TSval=377482790 TSecr=381905949
  26758 581.436982     192.168.21.51         192.168.99.100        TCP      42551→cisco-sccp(2000) [PSH, ACK] Seq=421 Ack=225 Win=30336 Len=819 TSval=377482793 TSecr=381905949
  26793 581.476317     192.168.99.100        192.168.21.51         TCP      cisco-sccp(2000)→42551 [ACK] Seq=225 Ack=1240 Win=33024 Len=0 TSval=381905993 TSecr=377482793
  26892 581.579434     192.168.99.100        192.168.21.51         TCP      cisco-sccp(2000)→42551 [PSH, ACK] Seq=225 Ack=1240 Win=33024 Len=64 TSval=381906096 TSecr=377482793
  26950 581.619040     192.168.21.51         192.168.99.100        TCP      42551→cisco-sccp(2000) [ACK] Seq=1240 Ack=289 Win=30336 Len=0 TSval=377482976 TSecr=381906096
  27012 581.652478     192.168.21.51         192.168.99.100        TCP      42551→cisco-sccp(2000) [PSH, ACK] Seq=1240 Ack=289 Win=30336 Len=1230 TSval=377483007 TSecr=381906096
  27013 581.652520     192.168.99.100        192.168.21.51         TCP      cisco-sccp(2000)→42551 [ACK] Seq=289 Ack=2470 Win=35968 Len=0 TSval=381906168 TSecr=377483007
  28392 590.844958     192.168.99.100        192.168.21.51         TCP      cisco-sccp(2000)→42551 [PSH, ACK] Seq=289 Ack=2470 Win=35968 Len=1 TSval=381915361 TSecr=377483007
  28427 590.955619     192.168.21.51         192.168.99.100        TCP      42551→cisco-sccp(2000) [PSH, ACK] Seq=2470 Ack=289 Win=30336 Len=1 TSval=377492312 TSecr=381906168
  28428 590.955628     192.168.99.100        192.168.21.51         TCP      cisco-sccp(2000)→42551 [ACK] Seq=290 Ack=2471 Win=35968 Len=0 TSval=381915472 TSecr=377492312
  28457 591.077735     192.168.99.100        192.168.21.51         TCP      [TCP Keep-Alive] cisco-sccp(2000)→42551 [PSH, ACK] Seq=289 Ack=2471 Win=35968 Len=1 TSval=381915594 TSecr=377492312
  28494 591.161676     192.168.21.51         192.168.99.100        TCP      [TCP Keep-Alive] 42551→cisco-sccp(2000) [PSH, ACK] Seq=2470 Ack=289 Win=30336 Len=1 TSval=377492518 TSecr=381906168
  28495 591.161733     192.168.99.100        192.168.21.51         TCP      [TCP Keep-Alive ACK] cisco-sccp(2000)→42551 [ACK] Seq=290 Ack=2471 Win=35968 Len=0 TSval=381915678 TSecr=377492518 SLE=2470 SRE=2471
  28526 591.367239     192.168.21.51         192.168.99.100        TCP      [TCP Keep-Alive] 42551→cisco-sccp(2000) [PSH, ACK] Seq=2470 Ack=289 Win=30336 Len=1 TSval=377492724 TSecr=381906168
  28527 591.367344     192.168.99.100        192.168.21.51         TCP      [TCP Keep-Alive ACK] cisco-sccp(2000)→42551 [ACK] Seq=290 Ack=2471 Win=35968 Len=0 TSval=381915883 TSecr=377492724 SLE=2470 SRE=2471
  28566 591.776390     192.168.99.100        192.168.21.51         TCP      [TCP Keep-Alive] cisco-sccp(2000)→42551 [PSH, ACK] Seq=289 Ack=2471 Win=35968 Len=1 TSval=381916293 TSecr=377492724
  28567 591.780375     192.168.21.51         192.168.99.100        TCP      [TCP Keep-Alive] 42551→cisco-sccp(2000) [PSH, ACK] Seq=2470 Ack=289 Win=30336 Len=1 TSval=377493137 TSecr=381906168
  28568 591.780472     192.168.99.100        192.168.21.51         TCP      [TCP Keep-Alive ACK] cisco-sccp(2000)→42551 [ACK] Seq=290 Ack=2471 Win=35968 Len=0 TSval=381916297 TSecr=377493137 SLE=2470 SRE=2471
  28601 592.243918     192.168.99.100        192.168.21.51         TCP      [TCP Keep-Alive] cisco-sccp(2000)→42551 [PSH, ACK] Seq=289 Ack=2471 Win=35968 Len=1 TSval=381916760 TSecr=377493137
  28639 592.607472     192.168.21.51         192.168.99.100        TCP      [TCP Keep-Alive] 42551→cisco-sccp(2000) [PSH, ACK] Seq=2470 Ack=289 Win=30336 Len=1 TSval=377493964 TSecr=381906168
  28640 592.607575     192.168.99.100        192.168.21.51         TCP      [TCP Keep-Alive ACK] cisco-sccp(2000)→42551 [ACK] Seq=290 Ack=2471 Win=35968 Len=0 TSval=381917124 TSecr=377493964 SLE=2470 SRE=2471
  28729 593.177610     192.168.99.100        192.168.21.51         TCP      [TCP Keep-Alive] cisco-sccp(2000)→42551 [PSH, ACK] Seq=289 Ack=2471 Win=35968 Len=1 TSval=381917694 TSecr=377493964
  28826 594.259300     192.168.21.51         192.168.99.100        TCP      [TCP Keep-Alive] 42551→cisco-sccp(2000) [PSH, ACK] Seq=2470 Ack=289 Win=30336 Len=1 TSval=377495616 TSecr=381906168
  28827 594.259358     192.168.99.100        192.168.21.51         TCP      [TCP Keep-Alive ACK] cisco-sccp(2000)→42551 [ACK] Seq=290 Ack=2471 Win=35968 Len=0 TSval=381918776 TSecr=377495616 SLE=2470 SRE=2471
  28863 595.043696     192.168.99.100        192.168.21.51         TCP      [TCP Keep-Alive] cisco-sccp(2000)→42551 [PSH, ACK] Seq=289 Ack=2471 Win=35968 Len=1 TSval=381919560 TSecr=377495616
  29669 597.563164     192.168.21.51         192.168.99.100        TCP      [TCP Keep-Alive] 42551→cisco-sccp(2000) [PSH, ACK] Seq=2470 Ack=289 Win=30336 Len=1 TSval=377498920 TSecr=381906168
  29670 597.563296     192.168.99.100        192.168.21.51         TCP      [TCP Keep-Alive ACK] cisco-sccp(2000)→42551 [ACK] Seq=290 Ack=2471 Win=35968 Len=0 TSval=381922079 TSecr=377498920 SLE=2470 SRE=2471
  30012 598.779594     192.168.99.100        192.168.21.51         TCP      [TCP Keep-Alive] cisco-sccp(2000)→42551 [PSH, ACK] Seq=289 Ack=2471 Win=35968 Len=1 TSval=381923296 TSecr=377498920
  30485 604.179630     192.168.21.51         192.168.99.100        TCP      [TCP Keep-Alive] 42551→cisco-sccp(2000) [PSH, ACK] Seq=2470 Ack=289 Win=30336 Len=1 TSval=377505536 TSecr=381906168
  30486 604.179745     192.168.99.100        192.168.21.51         TCP      [TCP Keep-Alive ACK] cisco-sccp(2000)→42551 [ACK] Seq=290 Ack=2471 Win=35968 Len=0 TSval=381928696 TSecr=377505536 SLE=2470 SRE=2471
  30679 606.251285     192.168.99.100        192.168.21.51         TCP      [TCP Keep-Alive] cisco-sccp(2000)→42551 [PSH, ACK] Seq=289 Ack=2471 Win=35968 Len=1 TSval=381930768 TSecr=377505536
  30824 610.881089     192.168.21.51         192.168.99.100        TCP      42551→cisco-sccp(2000) [FIN, PSH, ACK] Seq=2471 Ack=289 Win=30336 Len=1 TSval=377512238 TSecr=381906168
  30825 610.881786     192.168.21.51         192.168.99.100        TCP      45431→cisco-sccp(2000) [SYN] Seq=0 Win=29200 Len=0 MSS=1460 SACK_PERM=1 TSval=377512238 TSecr=0 WS=128
  30826 610.881829     192.168.99.100        192.168.21.51         TCP      cisco-sccp(2000)→45431 [SYN, ACK] Seq=0 Ack=1 Win=28960 Len=0 MSS=1460 SACK_PERM=1 TSval=381935398 TSecr=377512238 WS=128
  30858 610.885132     192.168.99.100        192.168.21.51         TCP      cisco-sccp(2000)→42551 [FIN, PSH, ACK] Seq=290 Ack=2473 Win=35968 Len=1 TSval=381935401 TSecr=377512238
  30937 611.883833     192.168.21.51         192.168.99.100        TCP      [TCP Spurious Retransmission] 45431→cisco-sccp(2000) [SYN] Seq=0 Win=29200 Len=0 MSS=1460 SACK_PERM=1 TSval=377513240 TSecr=0 WS=128
  30938 611.884005     192.168.99.100        192.168.21.51         TCP      [TCP Retransmission] cisco-sccp(2000)→45431 [SYN, ACK] Seq=0 Ack=1 Win=28960 Len=0 MSS=1460 SACK_PERM=1 TSval=381936400 TSecr=377512238 WS=128
  30973 612.884024     192.168.99.100        192.168.21.51         TCP      [TCP Retransmission] cisco-sccp(2000)→45431 [SYN, ACK] Seq=0 Ack=1 Win=28960 Len=0 MSS=1460 SACK_PERM=1 TSval=381937400 TSecr=377512238 WS=128
  30996 613.887453     192.168.21.51         192.168.99.100        TCP      [TCP Spurious Retransmission] 45431→cisco-sccp(2000) [SYN] Seq=0 Win=29200 Len=0 MSS=1460 SACK_PERM=1 TSval=377515244 TSecr=0 WS=128
  30997 613.887564     192.168.99.100        192.168.21.51         TCP      [TCP Retransmission] cisco-sccp(2000)→45431 [SYN, ACK] Seq=0 Ack=1 Win=28960 Len=0 MSS=1460 SACK_PERM=1 TSval=381938404 TSecr=377512238 WS=128
  31123 616.083906     192.168.99.100        192.168.21.51         TCP      [TCP Retransmission] cisco-sccp(2000)→45431 [SYN, ACK] Seq=0 Ack=1 Win=28960 Len=0 MSS=1460 SACK_PERM=1 TSval=381940600 TSecr=377512238 WS=128
  31195 617.395119     192.168.21.51         192.168.99.100        TCP      [TCP Spurious Retransmission] 42551→cisco-sccp(2000) [FIN, PSH, ACK] Seq=2470 Ack=289 Win=30336 Len=2 TSval=377518752 TSecr=381906168
  31196 617.395213     192.168.99.100        192.168.21.51         TCP      [TCP Dup ACK 30858#1] cisco-sccp(2000)→42551 [ACK] Seq=292 Ack=2473 Win=35968 Len=0 TSval=381941911 TSecr=377518752 SLE=2470 SRE=2473
  31197 617.891274     192.168.21.51         192.168.99.100        TCP      [TCP Spurious Retransmission] 45431→cisco-sccp(2000) [SYN] Seq=0 Win=29200 Len=0 MSS=1460 SACK_PERM=1 TSval=377519248 TSecr=0 WS=128
  31198 617.891377     192.168.99.100        192.168.21.51         TCP      [TCP Retransmission] cisco-sccp(2000)→45431 [SYN, ACK] Seq=0 Ack=1 Win=28960 Len=0 MSS=1460 SACK_PERM=1 TSval=381942407 TSecr=377512238 WS=128
  31358 621.211512     192.168.99.100        192.168.21.51         TCP      [TCP Retransmission] cisco-sccp(2000)→42551 [FIN, PSH, ACK] Seq=289 Ack=2473 Win=35968 Len=2 TSval=381945728 TSecr=377518752
  31392 622.484650     192.168.99.100        192.168.21.51         TCP      [TCP Retransmission] cisco-sccp(2000)→45431 [SYN, ACK] Seq=0 Ack=1 Win=28960 Len=0 MSS=1460 SACK_PERM=1 TSval=381947001 TSecr=377512238 WS=128
  31465 625.907246     192.168.21.51         192.168.99.100        TCP      [TCP Spurious Retransmission] 45431→cisco-sccp(2000) [SYN] Seq=0 Win=29200 Len=0 MSS=1460 SACK_PERM=1 TSval=377527264 TSecr=0 WS=128
  31466 625.907346     192.168.99.100        192.168.21.51         TCP      [TCP Retransmission] cisco-sccp(2000)→45431 [SYN, ACK] Seq=0 Ack=1 Win=28960 Len=0 MSS=1460 SACK_PERM=1 TSval=381950423 TSecr=377512238 WS=128
  31847 634.085643     192.168.99.100        192.168.21.51         TCP      [TCP Retransmission] cisco-sccp(2000)→45431 [SYN, ACK] Seq=0 Ack=1 Win=28960 Len=0 MSS=1460 SACK_PERM=1 TSval=381958602 TSecr=377512238 WS=128
  32326 641.938500     192.168.21.51         192.168.99.100        TCP      [TCP Spurious Retransmission] 45431→cisco-sccp(2000) [SYN] Seq=0 Win=29200 Len=0 MSS=1460 SACK_PERM=1 TSval=377543296 TSecr=0 WS=128
  32327 641.938568     192.168.99.100        192.168.21.51         TCP      [TCP Retransmission] cisco-sccp(2000)→45431 [SYN, ACK] Seq=0 Ack=1 Win=28960 Len=0 MSS=1460 SACK_PERM=1 TSval=381966455 TSecr=377512238 WS=128
  32458 643.859279     192.168.21.51         192.168.99.100        TCP      [TCP Spurious Retransmission] 42551→cisco-sccp(2000) [FIN, PSH, ACK] Seq=2470 Ack=289 Win=30336 Len=2 TSval=377545216 TSecr=381906168
  32459 643.859394     192.168.99.100        192.168.21.51         TCP      [TCP Dup ACK 30858#2] cisco-sccp(2000)→42551 [ACK] Seq=292 Ack=2473 Win=35968 Len=0 TSval=381968375 TSecr=377545216 SLE=2470 SRE=2473
  32861 651.099614     192.168.99.100        192.168.21.51         TCP      [TCP Retransmission] cisco-sccp(2000)→42551 [FIN, PSH, ACK] Seq=289 Ack=2473 Win=35968 Len=2 TSval=381975616 TSecr=377545216
  33374 658.088603     192.168.99.100        192.168.21.51         TCP      [TCP Retransmission] cisco-sccp(2000)→45431 [SYN, ACK] Seq=0 Ack=1 Win=28960 Len=0 MSS=1460 SACK_PERM=1 TSval=381982605 TSecr=377512238 WS=128
  34426 674.002725     192.168.21.51         192.168.99.100        TCP      [TCP Spurious Retransmission] 45431→cisco-sccp(2000) [SYN] Seq=0 Win=29200 Len=0 MSS=1460 SACK_PERM=1 TSval=377575360 TSecr=0 WS=128
  34433 674.004602     192.168.99.100        192.168.21.51         TCP      cisco-sccp(2000)→45431 [RST, ACK] Seq=668009898 Ack=1 Win=0 Len=0

与我交谈过的每个人都同意在 TCP 交互中存在 "weirdness"。

我重写了我的应用程序,以便它可以与同一 IP 但不同端口的多个模拟客户端通信,问题完全消失了。

所以,要么是内核3.10.0的macvlan不适合这样使用,要么是我设置不正确。或两者兼而有之。