Keepalived 进入一个糟糕的状态,单个数据包被反复淹没

Keepalived gets into a bad state where a single packet become repeatedly flooded

我有两台服务器运行 Keepalived,使用直接路由进行故障转移和负载平衡。该设置将在一段时间内正常工作。最终,它将停止响应。当我查看 tcpdump 时,我看到大量这样的消息:

15:14:55.943992 IP (tos 0x0, ttl 59, id 32319, offset 0, flags [DF], proto TCP (6), length 60)
    10.31.109.208.50132 > 10.18.28.224.https: Flags [S], cksum 0x7cb9 (correct), seq 1334967248, win 29200, options [mss 1460,sackOK,TS val 2453083948 ecr 0,nop,wscale 7], length 0
15:14:55.944173 IP (tos 0x0, ttl 59, id 32319, offset 0, flags [DF], proto TCP (6), length 60)
    10.31.109.208.50132 > 10.18.28.224.https: Flags [S], cksum 0x7cb9 (correct), seq 1334967248, win 29200, options [mss 1460,sackOK,TS val 2453083948 ecr 0,nop,wscale 7], length 0
15:14:55.944183 IP (tos 0x0, ttl 59, id 32319, offset 0, flags [DF], proto TCP (6), length 60)
    10.31.109.208.50132 > 10.18.28.224.https: Flags [S], cksum 0x7cb9 (correct), seq 1334967248, win 29200, options [mss 1460,sackOK,TS val 2453083948 ecr 0,nop,wscale 7], length 0
15:14:55.944370 IP (tos 0x0, ttl 59, id 32319, offset 0, flags [DF], proto TCP (6), length 60)
    10.31.109.208.50132 > 10.18.28.224.https: Flags [S], cksum 0x7cb9 (correct), seq 1334967248, win 29200, options [mss 1460,sackOK,TS val 2453083948 ecr 0,nop,wscale 7], length 0
15:14:55.944379 IP (tos 0x0, ttl 59, id 32319, offset 0, flags [DF], proto TCP (6), length 60)
    10.31.109.208.50132 > 10.18.28.224.https: Flags [S], cksum 0x7cb9 (correct), seq 1334967248, win 29200, options [mss 1460,sackOK,TS val 2453083948 ecr 0,nop,wscale 7], length 0
15:14:55.944571 IP (tos 0x0, ttl 59, id 32319, offset 0, flags [DF], proto TCP (6), length 60)
    10.31.109.208.50132 > 10.18.28.224.https: Flags [S], cksum 0x7cb9 (correct), seq 1334967248, win 29200, options [mss 1460,sackOK,TS val 2453083948 ecr 0,nop,wscale 7], length 0
15:14:55.944581 IP (tos 0x0, ttl 59, id 32319, offset 0, flags [DF], proto TCP (6), length 60)
    10.31.109.208.50132 > 10.18.28.224.https: Flags [S], cksum 0x7cb9 (correct), seq 1334967248, win 29200, options [mss 1460,sackOK,TS val 2453083948 ecr 0,nop,wscale 7], length 0
15:14:55.944755 IP (tos 0x0, ttl 59, id 32319, offset 0, flags [DF], proto TCP (6), length 60)
    10.31.109.208.50132 > 10.18.28.224.https: Flags [S], cksum 0x7cb9 (correct), seq 1334967248, win 29200, options [mss 1460,sackOK,TS val 2453083948 ecr 0,nop,wscale 7], length 0
15:14:55.944764 IP (tos 0x0, ttl 59, id 32319, offset 0, flags [DF], proto TCP (6), length 60)
    10.31.109.208.50132 > 10.18.28.224.https: Flags [S], cksum 0x7cb9 (correct), seq 1334967248, win 29200, options [mss 1460,sackOK,TS val 2453083948 ecr 0,nop,wscale 7], length 0
15:14:55.944952 IP (tos 0x0, ttl 59, id 32319, offset 0, flags [DF], proto TCP (6), length 60)
    10.31.109.208.50132 > 10.18.28.224.https: Flags [S], cksum 0x7cb9 (correct), seq 1334967248, win 29200, options [mss 1460,sackOK,TS val 2453083948 ecr 0,nop,wscale 7], length 0
15:14:55.944967 IP (tos 0x0, ttl 59, id 32319, offset 0, flags [DF], proto TCP (6), length 60)
    10.31.109.208.50132 > 10.18.28.224.https: Flags [S], cksum 0x7cb9 (correct), seq 1334967248, win 29200, options [mss 1460,sackOK,TS val 2453083948 ecr 0,nop,wscale 7], length 0
15:14:55.945140 IP (tos 0x0, ttl 59, id 32319, offset 0, flags [DF], proto TCP (6), length 60)
    10.31.109.208.50132 > 10.18.28.224.https: Flags [S], cksum 0x7cb9 (correct), seq 1334967248, win 29200, options [mss 1460,sackOK,TS val 2453083948 ecr 0,nop,wscale 7], length 0
15:14:55.945150 IP (tos 0x0, ttl 59, id 32319, offset 0, flags [DF], proto TCP (6), length 60)
    10.31.109.208.50132 > 10.18.28.224.https: Flags [S], cksum 0x7cb9 (correct), seq 1334967248, win 29200, options [mss 1460,sackOK,TS val 2453083948 ecr 0,nop,wscale 7], length 0
15:14:55.945322 IP (tos 0x0, ttl 59, id 32319, offset 0, flags [DF], proto TCP (6), length 60)
    10.31.109.208.50132 > 10.18.28.224.https: Flags [S], cksum 0x7cb9 (correct), seq 1334967248, win 29200, options [mss 1460,sackOK,TS val 2453083948 ecr 0,nop,wscale 7], length 0
15:14:55.945331 IP (tos 0x0, ttl 59, id 32319, offset 0, flags [DF], proto TCP (6), length 60)
    10.31.109.208.50132 > 10.18.28.224.https: Flags [S], cksum 0x7cb9 (correct), seq 1334967248, win 29200, options [mss 1460,sackOK,TS val 2453083948 ecr 0,nop,wscale 7], length 0
15:14:55.945506 IP (tos 0x0, ttl 59, id 32319, offset 0, flags [DF], proto TCP (6), length 60)
    10.31.109.208.50132 > 10.18.28.224.https: Flags [S], cksum 0x7cb9 (correct), seq 1334967248, win 29200, options [mss 1460,sackOK,TS val 2453083948 ecr 0,nop,wscale 7], length 0
15:14:55.945514 IP (tos 0x0, ttl 59, id 32319, offset 0, flags [DF], proto TCP (6), length 60)
    10.31.109.208.50132 > 10.18.28.224.https: Flags [S], cksum 0x7cb9 (correct), seq 1334967248, win 29200, options [mss 1460,sackOK,TS val 2453083948 ecr 0,nop,wscale 7], length 0
15:14:55.945701 IP (tos 0x0, ttl 59, id 32319, offset 0, flags [DF], proto TCP (6), length 60)
    10.31.109.208.50132 > 10.18.28.224.https: Flags [S], cksum 0x7cb9 (correct), seq 1334967248, win 29200, options [mss 1460,sackOK,TS val 2453083948 ecr 0,nop,wscale 7], length 0
15:14:55.945710 IP (tos 0x0, ttl 59, id 32319, offset 0, flags [DF], proto TCP (6), length 60)
    10.31.109.208.50132 > 10.18.28.224.https: Flags [S], cksum 0x7cb9 (correct), seq 1334967248, win 29200, options [mss 1460,sackOK,TS val 2453083948 ecr 0,nop,wscale 7], length 0

10.31.109.208 是我的地址。即使我关闭浏览器,数据包仍在继续。重新启动 keepalived 或 Nginx 无法解决问题。重启似乎是唯一可以解决它的方法。发生这种情况时,服务器甚至无法在该接口上与自己对话,这让我认为这不是路由问题。

按照此处的说明进行操作。他们很老,但仍然适用。 http://gcharriere.com/blog/?p=339

您需要向第二个系统添加 IPTables 预路由规则,这样数据包就不会来回反弹。

像这样,192.168.9.100 是 VIP:

iptables -A PREROUTING -t nat -d 192.168.9.100 -p tcp -j REDIRECT

确保在该机器成为主服务器时将其删除。 IPTables 规则可以添加多次,因此请确保在添加之前检查它是否不存在。