无法通过第二个 NIC 建立连接(两跳)
Can't establish connection over second NIC (two hops)
我们在 Ubuntu Xenial 中配置网络路由时遇到问题。
我们有很多服务器同时安装了 Debian 8.4 (Jessie) 和 Ubuntu 16.04.2 (xenial)
和 完全相同的 网络设置(或至少就我们所见)。
它们都有两个 NIC 连接到两个 VLAN(比如说 "A" 和 "B")都可以访问
尽管其他 VLAN 说,例如,来自 VLAN "C".
两个 /etc/network/interfaces
文件的格式为:
NOTE: I faked names and IPs for the sake of better readability.
# VLAN A
auto eth0
iface eth0 inet static
address 192.168.111.xxx
netmask 255.255.255.0
broadcast 192.168.111.255
network 192.168.111.0
gateway 192.168.111.254
dns-nameservers 192.168.111.25 192.168.111.26
# VLAN B
auto eth1
iface eth1 inet static
address 192.168.222.xxx
netmask 255.255.255.0
broadcast 192.168.222.255
network 192.168.222.0
gateway 192.168.222.254 # <-- (Commented out in Ubuntu machine)
dns-nameservers 192.168.111.25 192.168.111.26
...假设 xxx
对于 Debian 机器是 100,对于 Ubuntu 机器是 200,我是
尝试从 VLAN "C" 中的 192.168.1.10 ping 到以下地址:
- 192.168.111.100:工作正常。
- 192.168.222.100:工作正常。
- 192.168.111.200:工作正常。
- 192.168.222.200: 没有答案!!
"B" vlan 主要用于备份和其他 "background" 流量
避免 vlan "A".
中的饱和问题
我知道用两条网络路径访问同一台机器并不常见
设置,我必须说,只有能够连接其中之一
现在其他网络不是大问题。但让我印象深刻的是 为什么
我可以访问 Debian 机器而不是 Ubuntu 机器?
Even, on the other hand, if it were working well in both platforms, we could
consider closing some services (such as ssh, and backend interfaces) from NIC
"A" to improve security (Our firewall only allows access to vlan "B" from our
IT staff vlan).
当然, 正如在之前的 interfaces 片段中评论的那样,gateway
行在 Ubuntu 台机器中被注释掉了,但那是因为,网络
否则该机器的初始化失败。也就是说,事实上,我们是
正在尝试解决。
但是两台机器路由 table 几乎相同。唯一的区别
我可以看到 Ubuntu 机器中的 onlink 标志:
myUser@debianMachine:~$ sudo ip route
default via 192.168.111.254 dev eth0
192.168.111.0/24 dev eth0 proto kernel scope link src 192.168.111.100
192.168.222.0/24 dev eth1 proto kernel scope link src 192.168.222.100
myUser@ubuntuMachine:~$ sudo ip route
default via 192.168.111.254 dev eth0 onlink
192.168.111.0/24 dev eth0 proto kernel scope link src 192.168.111.200
192.168.222.0/24 dev eth1 proto kernel scope link src 192.168.222.200
...但我能够通过以下命令将其删除:
myUser@ubuntuMachine:~$ sudo ip route replace default via 192.168.111.254 dev eth0
myUser@ubuntuMachine:~$ sudo ip route
default via 192.168.111.254 dev eth0
192.168.111.0/24 dev eth0 proto kernel scope link src 192.168.111.200
192.168.222.0/24 dev eth1 proto kernel scope link src 192.168.222.200
并没有解决问题。
在那之后,我还尝试取消注释 gateway 行 'VLAN B' ,因为我
说,它在 /etc/network/interfaces 文件中被注释掉并试图
重新启动网络,但这是发生了什么:
myUser@ubuntuMachine:~$ sudo /etc/init.d/networking restart
[....] Restarting networking (via systemctl): networking.serviceJob for networking.service failed because the control process exited with error code. See "systemctl status networking.service" and "journalctl -xe" for details.
failed!
...onlink 标志又回来了。
As a note, commenting out that line again and issuing new
/etc/init.d/networking restart
command, the output is the same until the
machine is rebooted, (even networking, despite the VLAN B default gateyay
issue, continues working as usual).
以下是建议命令的输出:
myUser@ubuntuMachine:~$ sudo systemctl status networking.service
● networking.service - Raise network interfaces
Loaded: loaded (/lib/systemd/system/networking.service; enabled; vendor preset: enabled)
Drop-In: /run/systemd/generator/networking.service.d
└─50-insserv.conf-$network.conf
Active: failed (Result: exit-code) since jue 2017-12-21 14:55:29 CET; 42s ago
Docs: man:interfaces(5)
Process: 8552 ExecStop=/sbin/ifdown -a --read-environment --exclude=lo (code=exited, status=0/SUCCESS)
Process: 8940 ExecStart=/sbin/ifup -a --read-environment (code=exited, status=1/FAILURE)
Process: 8934 ExecStartPre=/bin/sh -c [ "$CONFIGURE_INTERFACES" != "no" ] && [ -n "$(ifquery --read-envi
Main PID: 8940 (code=exited, status=1/FAILURE)
dic 21 14:55:29 ubuntuMachine systemd[1]: Stopped Raise network interfaces.
dic 21 14:55:29 ubuntuMachine systemd[1]: Starting Raise network interfaces...
dic 21 14:55:29 ubuntuMachine ifup[8940]: RTNETLINK answers: File exists
dic 21 14:55:29 ubuntuMachine ifup[8940]: Failed to bring up eth1.
dic 21 14:55:29 ubuntuMachine systemd[1]: networking.service: Main process exited, code=exited, status=1/FAILUR
dic 21 14:55:29 ubuntuMachine systemd[1]: Failed to start Raise network interfaces.
dic 21 14:55:29 ubuntuMachine systemd[1]: networking.service: Unit entered failed state.
dic 21 14:55:29 ubuntuMachine systemd[1]: networking.service: Failed with result 'exit-code'.
...以及sudo journalctl -xe
的有意义的部分:
dic 21 14:55:29 ubuntuMachine sudo[8922]: myUser : TTY=pts/0 ; PWD=/home/myUser ; USER=root ; COMMAND=/etc/init.d/networking restart
dic 21 14:55:29 ubuntuMachine sudo[8922]: pam_unix(sudo:session): session opened for user root by myUser(uid=0)
dic 21 14:55:29 ubuntuMachine systemd[1]: Stopped Raise network interfaces.
-- Subject: Unit networking.service has finished shutting down
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit networking.service has finished shutting down.
dic 21 14:55:29 ubuntuMachine systemd[1]: Starting Raise network interfaces...
-- Subject: Unit networking.service has begun start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit networking.service has begun starting up.
dic 21 14:55:29 ubuntuMachine ifup[8940]: RTNETLINK answers: File exists
dic 21 14:55:29 ubuntuMachine ifup[8940]: Failed to bring up eth1.
dic 21 14:55:29 ubuntuMachine systemd[1]: networking.service: Main process exited, code=exited, status=1/FAILURE
dic 21 14:55:29 ubuntuMachine systemd[1]: Failed to start Raise network interfaces.
-- Subject: Unit networking.service has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit networking.service has failed.
--
-- The result is failed.
dic 21 14:55:29 ubuntuMachine systemd[1]: networking.service: Unit entered failed state.
dic 21 14:55:29 ubuntuMachine systemd[1]: networking.service: Failed with result 'exit-code'.
dic 21 14:55:29 ubuntuMachine sudo[8922]: pam_unix(sudo:session): session closed for user root
我用谷歌搜索了很多关于能够找到一些相关信息但是 none
完全回答我的问题:
一个 explanation of "onlink" flag 在我看来它指向
排除 "onlink" 标志负责的可能性
"wrong back routing" 的意思是 « 告诉内核它
不必检查网关是否可以直接访问
当前机器» 所以(我想通了)内核可能认为它可以(或者
应该)将传入连接的应答从 VLAN C 路由到
默认网关而不是认为 相同的网卡来自哪里
连接已启动.
- 但是,如我所说,删除 "onlink" 标志似乎没有改变
任何东西。
这个unix StackExchange answer好像解决了问题(我没有
通过使用多个路由 tables 和规则(告诉
table 使用的内核)。但是它并没有解释为什么 Debian
机器运行良好(我检查了 /etc/iproute2/rt_tables 的文件
两台机器和 它们也是相同的:
myUser@bothMachines:~$ sudo cat /etc/iproute2/rt_tables
#
# reserved values
#
255 local
254 main
253 default
0 unspec
#
# local
#
#1 inr.ruhep
所以我最后的假设是这可能只是一个实现差异
在内核版本之间,并且 ubuntu 一个是最新的,这个
可能是正确的行为 所以,在现代内核中,我需要使用两个
不同的路由 tables(但我不确定,也不知道为什么...)。
myUser@debianMachine:~$ sudo uname -a
Linux debianMachine 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt25-2 (2016-04-08) x86_64 GNU/Linux
myUser@ubuntuMachine:~$ sudo uname -a
Linux ubuntuMachine 4.4.0-87-generic #110-Ubuntu SMP Tue Jul 18 12:55:35 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
因此,问题是:
我们是不是在 Ubuntu 机器上做错了什么(或者其中有一些错误)?或者,相反,这是正确的行为,我们被迫设置更复杂的路由模式(通过每 vlan 路由或使用两个路由 table 使两个默认网关再次工作)?
编辑:
现在我尝试添加静态路由来解决问题:
myUser@ubuntuMachine:~$ sudo ip route add 192.168.1.0/24 via 192.168.222.254 dev eth1
...但这冻结了我的 ssh 连接(认为是 NIC A),即使我可以连接认为是 NIC B(位于 192.168.111.200)
两条规则同时出现似乎是不可能的:
myUser@ubuntuMachine:~$ sudo ip route add 192.168.1/24 via 102.168.111.254 dev eth0
myUser@ubuntuMachine:~$ sudo ip route add 192.168.1/24 via 192.168.222.254 dev eth1
RTNETLINK answers: File exists
编辑 2:
我终于找到了 Linux Advanced Routing & Traffic Control HOWTO which seems to be more accurate than all other documentation I found and specifically in its Chapter 4. Rules - routing policy database 我看到以下文字:
If you want to use this feature, make sure that your kernel is
compiled with the "IP: advanced router" and "IP: policy routing"
features
...所以我的所有观点都表明我之前关于内核实现差异的假设是正确的,并且具体的差异在于编译了这两个功能。
不是权威答案,而是我的第一次工作尝试(应用我设法理解的内容):
sudo ip route add 192.168.1.0/24 via 192.168.222.254 from 192.168.222.200 dev eth1 table 253
sudo ip rule add from 192.168.222.200 table 253
Update: from
and dev
arguments in the ip route
command aren't required (it works perfetly well without them).
...在发出第一个命令后我无法连接,但在发出第二个命令后是。
其背后的逻辑来自我在 this document 中找到的这段文字:
Linux-2.x can pack routes into several routing tables identified by a number in the range from 1 to 255 or by name from the file /etc/iproute2/rt_tables By default all normal routes are inserted into the main table (ID 254) and the kernel only uses this table when calculating routes.
Actually, one other table always exists, which is invisible but even more important. It is the local table (ID 255). This table consists of routes for local and broadcast addresses. The kernel maintains this table automatically and the administrator usually need not modify it or even look at it.
事实上,我最终使用了另一个路由 table,由其 id (253) 标识,而不是我现在理解的它只是一个别名(在 [=145 中定义) =] 文件).
...再次检查该文件,我现在看到已经为该路由 table 定义了一个别名 ("default")(在 "main" 旁边正如我之前粘贴的文本片段所说,确实是 254。
我还不知道这个命名背后的逻辑是什么(我的意思是 "default" 用于 253 路由 table),如果出于任何原因,最好使用较低的路由 tables (1, 2, 3...) 就像 this solution (已经在问题中提到)一样。
但是,为了简单起见,如果我们不打算构建复杂的路由策略,而只是想解决这个 连接问题 ,我想这可能是一个很好的选择使用类似 (not yet tested):
的解决方案
gateway 192.168.222.254 table 253
post-up ip rule add from 192.168.222.200 table 253
I still need to test and check if I need an additional via 192.168.222.254
in the gateway row or if it won't work at all and need to add it with another post-up command instead.
I will update this answer with the results.
编辑 1: 同样适用于 default 路由:
sudo ip route add default from 192.168.222.200 via 192.168.222.254 table 253
sudo ip rule add from 192.168.222.200 table 253
编辑 2: 第一种(现在完全 ¹)工作方法
在测试机上玩了一段时间后,我认为最好的解决办法是在 /etc/network/interfaces
文件中的第二个网卡配置中添加以下行:
gateway 192.168.222.254 table 1
post-up ip rule add from 192.169.222.200 table 1
pre-down ip rule del from 192.168.222.200 table 1
post-up ip route add 192.188.222.0/24 dev eth1 src 192.168.222.200 table 1
评论:
将 table 1
添加到 gateway
关键字效果很好,因此附加(可读性较差)post-up 命令不需要添加默认路由。
- ...事实上,对第一个 NIC 使用特定的 table(除了 main)以及与我们对第二个 NIC 使用的规则类似的规则会这是一个坏主意,因为该规则仅在 192.168.111.200 将用作源地址时才适用 ,因此不会有任何 "default default gateway"。在 main 路由 table 中保留第一个 NIC 配置将使所有 ("locally generated") 到远程 LAN 的传出连接将通过我们的第一个 default默认网关。
第一个 post-up
命令添加了一条规则,即带有该 NIC 源地址的数据包应该使用 table 1 进行路由(否则我们的新默认网关将不会使用).
pre-down
命令删除该规则。它不是强制性的,但如果没有它,多次网络服务重启将每次都重复此规则。
我也尝试使用 dev eth1
而不是 from 192.169.222.200
(以避免必须重复网络地址),但它没有用。我猜 "response" 数据包使用哪个 NIC 是 "not yet decided".
我将 table 1
用于 eth1(我们的第二个 NIC),我可以将 table 2
用于最终的第三个等等在。不需要为第一个 NIC 指定任何 table/rule,因为它涉及 main table(不是 "default":见下面的注释)。
最后(¹)第二个 post-up
命令使所有事情都运行良好,因为(正如我现在意识到的那样)仅(第一次匹配)使用一个路由 table 所以默认网络路由(在界面启动时自动创建)不适用,因为它是在 table main.
中创建的
- 我仍然不知道是否有办法强制将其直接装箱到table 1.
NOTE: By command sudo ip rule list
we can see current routing rules as follows:
0: from all lookup local
32765: from 192.168.222.200 lookup 1
32766: from all lookup main
32767: from all lookup default
As I can understand, they are added decreasingly from 32767 to 0 and tried
increasingly until one matches. Last two ones and the "0" were already
defined by default. The former because of the logic I previously cited
from this document but that documents says that rules starts from "1"
so I guess "0" should also be some predefined "default starting point".
编辑 3:
正如我在编辑 2(问题)中所说,我发现这个 Linux Advanced Routing & Traffic Control HOWTO 对我澄清事情有很大帮助。
具体来说,Routing for multiple uplinks/providers 一章对我理解具有 "network loops" 的设置非常有用(即使在我们的例子中,我们不充当互联网的路由器)。
我们在 Ubuntu Xenial 中配置网络路由时遇到问题。
我们有很多服务器同时安装了 Debian 8.4 (Jessie) 和 Ubuntu 16.04.2 (xenial) 和 完全相同的 网络设置(或至少就我们所见)。
它们都有两个 NIC 连接到两个 VLAN(比如说 "A" 和 "B")都可以访问 尽管其他 VLAN 说,例如,来自 VLAN "C".
两个 /etc/network/interfaces
文件的格式为:
NOTE: I faked names and IPs for the sake of better readability.
# VLAN A
auto eth0
iface eth0 inet static
address 192.168.111.xxx
netmask 255.255.255.0
broadcast 192.168.111.255
network 192.168.111.0
gateway 192.168.111.254
dns-nameservers 192.168.111.25 192.168.111.26
# VLAN B
auto eth1
iface eth1 inet static
address 192.168.222.xxx
netmask 255.255.255.0
broadcast 192.168.222.255
network 192.168.222.0
gateway 192.168.222.254 # <-- (Commented out in Ubuntu machine)
dns-nameservers 192.168.111.25 192.168.111.26
...假设 xxx
对于 Debian 机器是 100,对于 Ubuntu 机器是 200,我是
尝试从 VLAN "C" 中的 192.168.1.10 ping 到以下地址:
- 192.168.111.100:工作正常。
- 192.168.222.100:工作正常。
- 192.168.111.200:工作正常。
- 192.168.222.200: 没有答案!!
"B" vlan 主要用于备份和其他 "background" 流量 避免 vlan "A".
中的饱和问题我知道用两条网络路径访问同一台机器并不常见 设置,我必须说,只有能够连接其中之一 现在其他网络不是大问题。但让我印象深刻的是 为什么 我可以访问 Debian 机器而不是 Ubuntu 机器?
Even, on the other hand, if it were working well in both platforms, we could consider closing some services (such as ssh, and backend interfaces) from NIC "A" to improve security (Our firewall only allows access to vlan "B" from our IT staff vlan).
当然, 正如在之前的 interfaces 片段中评论的那样,gateway 行在 Ubuntu 台机器中被注释掉了,但那是因为,网络 否则该机器的初始化失败。也就是说,事实上,我们是 正在尝试解决。
但是两台机器路由 table 几乎相同。唯一的区别 我可以看到 Ubuntu 机器中的 onlink 标志:
myUser@debianMachine:~$ sudo ip route
default via 192.168.111.254 dev eth0
192.168.111.0/24 dev eth0 proto kernel scope link src 192.168.111.100
192.168.222.0/24 dev eth1 proto kernel scope link src 192.168.222.100
myUser@ubuntuMachine:~$ sudo ip route
default via 192.168.111.254 dev eth0 onlink
192.168.111.0/24 dev eth0 proto kernel scope link src 192.168.111.200
192.168.222.0/24 dev eth1 proto kernel scope link src 192.168.222.200
...但我能够通过以下命令将其删除:
myUser@ubuntuMachine:~$ sudo ip route replace default via 192.168.111.254 dev eth0
myUser@ubuntuMachine:~$ sudo ip route
default via 192.168.111.254 dev eth0
192.168.111.0/24 dev eth0 proto kernel scope link src 192.168.111.200
192.168.222.0/24 dev eth1 proto kernel scope link src 192.168.222.200
并没有解决问题。
在那之后,我还尝试取消注释 gateway 行 'VLAN B' ,因为我 说,它在 /etc/network/interfaces 文件中被注释掉并试图 重新启动网络,但这是发生了什么:
myUser@ubuntuMachine:~$ sudo /etc/init.d/networking restart
[....] Restarting networking (via systemctl): networking.serviceJob for networking.service failed because the control process exited with error code. See "systemctl status networking.service" and "journalctl -xe" for details.
failed!
...onlink 标志又回来了。
As a note, commenting out that line again and issuing new
/etc/init.d/networking restart
command, the output is the same until the machine is rebooted, (even networking, despite the VLAN B default gateyay issue, continues working as usual).
以下是建议命令的输出:
myUser@ubuntuMachine:~$ sudo systemctl status networking.service
● networking.service - Raise network interfaces
Loaded: loaded (/lib/systemd/system/networking.service; enabled; vendor preset: enabled)
Drop-In: /run/systemd/generator/networking.service.d
└─50-insserv.conf-$network.conf
Active: failed (Result: exit-code) since jue 2017-12-21 14:55:29 CET; 42s ago
Docs: man:interfaces(5)
Process: 8552 ExecStop=/sbin/ifdown -a --read-environment --exclude=lo (code=exited, status=0/SUCCESS)
Process: 8940 ExecStart=/sbin/ifup -a --read-environment (code=exited, status=1/FAILURE)
Process: 8934 ExecStartPre=/bin/sh -c [ "$CONFIGURE_INTERFACES" != "no" ] && [ -n "$(ifquery --read-envi
Main PID: 8940 (code=exited, status=1/FAILURE)
dic 21 14:55:29 ubuntuMachine systemd[1]: Stopped Raise network interfaces.
dic 21 14:55:29 ubuntuMachine systemd[1]: Starting Raise network interfaces...
dic 21 14:55:29 ubuntuMachine ifup[8940]: RTNETLINK answers: File exists
dic 21 14:55:29 ubuntuMachine ifup[8940]: Failed to bring up eth1.
dic 21 14:55:29 ubuntuMachine systemd[1]: networking.service: Main process exited, code=exited, status=1/FAILUR
dic 21 14:55:29 ubuntuMachine systemd[1]: Failed to start Raise network interfaces.
dic 21 14:55:29 ubuntuMachine systemd[1]: networking.service: Unit entered failed state.
dic 21 14:55:29 ubuntuMachine systemd[1]: networking.service: Failed with result 'exit-code'.
...以及sudo journalctl -xe
的有意义的部分:
dic 21 14:55:29 ubuntuMachine sudo[8922]: myUser : TTY=pts/0 ; PWD=/home/myUser ; USER=root ; COMMAND=/etc/init.d/networking restart
dic 21 14:55:29 ubuntuMachine sudo[8922]: pam_unix(sudo:session): session opened for user root by myUser(uid=0)
dic 21 14:55:29 ubuntuMachine systemd[1]: Stopped Raise network interfaces.
-- Subject: Unit networking.service has finished shutting down
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit networking.service has finished shutting down.
dic 21 14:55:29 ubuntuMachine systemd[1]: Starting Raise network interfaces...
-- Subject: Unit networking.service has begun start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit networking.service has begun starting up.
dic 21 14:55:29 ubuntuMachine ifup[8940]: RTNETLINK answers: File exists
dic 21 14:55:29 ubuntuMachine ifup[8940]: Failed to bring up eth1.
dic 21 14:55:29 ubuntuMachine systemd[1]: networking.service: Main process exited, code=exited, status=1/FAILURE
dic 21 14:55:29 ubuntuMachine systemd[1]: Failed to start Raise network interfaces.
-- Subject: Unit networking.service has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit networking.service has failed.
--
-- The result is failed.
dic 21 14:55:29 ubuntuMachine systemd[1]: networking.service: Unit entered failed state.
dic 21 14:55:29 ubuntuMachine systemd[1]: networking.service: Failed with result 'exit-code'.
dic 21 14:55:29 ubuntuMachine sudo[8922]: pam_unix(sudo:session): session closed for user root
我用谷歌搜索了很多关于能够找到一些相关信息但是 none 完全回答我的问题:
一个 explanation of "onlink" flag 在我看来它指向 排除 "onlink" 标志负责的可能性 "wrong back routing" 的意思是 « 告诉内核它 不必检查网关是否可以直接访问 当前机器» 所以(我想通了)内核可能认为它可以(或者 应该)将传入连接的应答从 VLAN C 路由到 默认网关而不是认为 相同的网卡来自哪里 连接已启动.
- 但是,如我所说,删除 "onlink" 标志似乎没有改变 任何东西。
这个unix StackExchange answer好像解决了问题(我没有 通过使用多个路由 tables 和规则(告诉 table 使用的内核)。但是它并没有解释为什么 Debian 机器运行良好(我检查了 /etc/iproute2/rt_tables 的文件 两台机器和 它们也是相同的:
myUser@bothMachines:~$ sudo cat /etc/iproute2/rt_tables
#
# reserved values
#
255 local
254 main
253 default
0 unspec
#
# local
#
#1 inr.ruhep
所以我最后的假设是这可能只是一个实现差异 在内核版本之间,并且 ubuntu 一个是最新的,这个 可能是正确的行为 所以,在现代内核中,我需要使用两个 不同的路由 tables(但我不确定,也不知道为什么...)。
myUser@debianMachine:~$ sudo uname -a
Linux debianMachine 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt25-2 (2016-04-08) x86_64 GNU/Linux
myUser@ubuntuMachine:~$ sudo uname -a
Linux ubuntuMachine 4.4.0-87-generic #110-Ubuntu SMP Tue Jul 18 12:55:35 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
因此,问题是:
我们是不是在 Ubuntu 机器上做错了什么(或者其中有一些错误)?或者,相反,这是正确的行为,我们被迫设置更复杂的路由模式(通过每 vlan 路由或使用两个路由 table 使两个默认网关再次工作)?
编辑:
现在我尝试添加静态路由来解决问题:
myUser@ubuntuMachine:~$ sudo ip route add 192.168.1.0/24 via 192.168.222.254 dev eth1
...但这冻结了我的 ssh 连接(认为是 NIC A),即使我可以连接认为是 NIC B(位于 192.168.111.200)
两条规则同时出现似乎是不可能的:
myUser@ubuntuMachine:~$ sudo ip route add 192.168.1/24 via 102.168.111.254 dev eth0
myUser@ubuntuMachine:~$ sudo ip route add 192.168.1/24 via 192.168.222.254 dev eth1
RTNETLINK answers: File exists
编辑 2:
我终于找到了 Linux Advanced Routing & Traffic Control HOWTO which seems to be more accurate than all other documentation I found and specifically in its Chapter 4. Rules - routing policy database 我看到以下文字:
If you want to use this feature, make sure that your kernel is compiled with the "IP: advanced router" and "IP: policy routing" features
...所以我的所有观点都表明我之前关于内核实现差异的假设是正确的,并且具体的差异在于编译了这两个功能。
不是权威答案,而是我的第一次工作尝试(应用我设法理解的内容):
sudo ip route add 192.168.1.0/24 via 192.168.222.254 from 192.168.222.200 dev eth1 table 253
sudo ip rule add from 192.168.222.200 table 253
Update:
from
anddev
arguments in theip route
command aren't required (it works perfetly well without them).
...在发出第一个命令后我无法连接,但在发出第二个命令后是。
其背后的逻辑来自我在 this document 中找到的这段文字:
Linux-2.x can pack routes into several routing tables identified by a number in the range from 1 to 255 or by name from the file /etc/iproute2/rt_tables By default all normal routes are inserted into the main table (ID 254) and the kernel only uses this table when calculating routes.
Actually, one other table always exists, which is invisible but even more important. It is the local table (ID 255). This table consists of routes for local and broadcast addresses. The kernel maintains this table automatically and the administrator usually need not modify it or even look at it.
事实上,我最终使用了另一个路由 table,由其 id (253) 标识,而不是我现在理解的它只是一个别名(在 [=145 中定义) =] 文件).
...再次检查该文件,我现在看到已经为该路由 table 定义了一个别名 ("default")(在 "main" 旁边正如我之前粘贴的文本片段所说,确实是 254。
我还不知道这个命名背后的逻辑是什么(我的意思是 "default" 用于 253 路由 table),如果出于任何原因,最好使用较低的路由 tables (1, 2, 3...) 就像 this solution (已经在问题中提到)一样。
但是,为了简单起见,如果我们不打算构建复杂的路由策略,而只是想解决这个 连接问题 ,我想这可能是一个很好的选择使用类似 (not yet tested):
的解决方案gateway 192.168.222.254 table 253
post-up ip rule add from 192.168.222.200 table 253
I still need to test and check if I need an additional
via 192.168.222.254
in the gateway row or if it won't work at all and need to add it with another post-up command instead.I will update this answer with the results.
编辑 1: 同样适用于 default 路由:
sudo ip route add default from 192.168.222.200 via 192.168.222.254 table 253
sudo ip rule add from 192.168.222.200 table 253
编辑 2: 第一种(现在完全 ¹)工作方法
在测试机上玩了一段时间后,我认为最好的解决办法是在 /etc/network/interfaces
文件中的第二个网卡配置中添加以下行:
gateway 192.168.222.254 table 1
post-up ip rule add from 192.169.222.200 table 1
pre-down ip rule del from 192.168.222.200 table 1
post-up ip route add 192.188.222.0/24 dev eth1 src 192.168.222.200 table 1
评论:
将
table 1
添加到gateway
关键字效果很好,因此附加(可读性较差)post-up 命令不需要添加默认路由。- ...事实上,对第一个 NIC 使用特定的 table(除了 main)以及与我们对第二个 NIC 使用的规则类似的规则会这是一个坏主意,因为该规则仅在 192.168.111.200 将用作源地址时才适用 ,因此不会有任何 "default default gateway"。在 main 路由 table 中保留第一个 NIC 配置将使所有 ("locally generated") 到远程 LAN 的传出连接将通过我们的第一个 default默认网关。
第一个
post-up
命令添加了一条规则,即带有该 NIC 源地址的数据包应该使用 table 1 进行路由(否则我们的新默认网关将不会使用).pre-down
命令删除该规则。它不是强制性的,但如果没有它,多次网络服务重启将每次都重复此规则。我也尝试使用
dev eth1
而不是from 192.169.222.200
(以避免必须重复网络地址),但它没有用。我猜 "response" 数据包使用哪个 NIC 是 "not yet decided".我将
table 1
用于 eth1(我们的第二个 NIC),我可以将table 2
用于最终的第三个等等在。不需要为第一个 NIC 指定任何 table/rule,因为它涉及 main table(不是 "default":见下面的注释)。最后(¹)第二个
中创建的post-up
命令使所有事情都运行良好,因为(正如我现在意识到的那样)仅(第一次匹配)使用一个路由 table 所以默认网络路由(在界面启动时自动创建)不适用,因为它是在 table main.- 我仍然不知道是否有办法强制将其直接装箱到table 1.
NOTE: By command
sudo ip rule list
we can see current routing rules as follows:0: from all lookup local 32765: from 192.168.222.200 lookup 1 32766: from all lookup main 32767: from all lookup default
As I can understand, they are added decreasingly from 32767 to 0 and tried increasingly until one matches. Last two ones and the "0" were already defined by default. The former because of the logic I previously cited from this document but that documents says that rules starts from "1" so I guess "0" should also be some predefined "default starting point".
编辑 3:
正如我在编辑 2(问题)中所说,我发现这个 Linux Advanced Routing & Traffic Control HOWTO 对我澄清事情有很大帮助。
具体来说,Routing for multiple uplinks/providers 一章对我理解具有 "network loops" 的设置非常有用(即使在我们的例子中,我们不充当互联网的路由器)。