如何可靠地检测丢弃的网络链接请求或响应
How to reliably detect dropped netlink requests or responses
我有兴趣将 netlink 用于简单的应用程序(以高频率读取 cgroup 统计信息)。
man page警告协议不可靠,暗示应用程序需要准备好处理丢弃的数据包:
However, reliable transmissions from kernel to user are impossible in
any case. The kernel can't send a netlink message if the socket buffer
is full: the message will be dropped and the kernel and the user-space
process will no longer have the same view of kernel state. It
is up to the application to detect when this happens (via the ENOBUFS
error returned by recvmsg(2)
) and resynchronize.
由于我的要求很简单,所以只要在发生任何意外情况时销毁套接字并创建一个新套接字就可以了。但是我找不到任何关于我的程序期望值的文档——例如 recvmsg(2)
的 man page 甚至没有提到 ENOBUFS
。
为了确保我可以判断来自我的应用程序的请求或来自内核的响应已被丢弃,以便我可以重置所有内容并重新开始,我需要担心什么?我很清楚,每当我从涉及的任何系统调用中收到错误时,我都可以这样做,但是例如,如果我的请求在通往内核的途中被丢弃会发生什么?我永远不会收到回复吗?我是否需要建立一个超时机制,让我只等待这么长时间的响应?
我在 Ayuso、Gasca 和 Lefevre 的 Communicating between the kernel and user-space in Linux using Netlink sockets 中找到了以下内容:
If Netlink fails to deliver a message that goes from kernel to user-space, the recvmsg()
function returns the No buffer space available (ENOBUFS
) error. Thus, the user-space process knows that it is losing messages [...]
On the other hand, buffer overruns cannot occur in communications from user to kernel-space since sendmsg()
synchronously passes the Netlink message to the kernel subsystem. If blocking sockets are used, Netlink is completely reliable in communications from user to kernel-space since memory allocations would wait, so no memory exhaustion is possible.
关于acks,看起来担心它们是可选的:
NLM_F_ACK
: the user-space application requested a confirmation message from
kernel-space to make sure that a given request was successfully performed. If this
flag is not set, the kernel-space reports the error synchronously via sendmsg()
as errno
value.
所以这听起来像是我的简单用例,我可以天真地使用 sendmsg
和 recvmsg
,通过重新开始整个事情来对任何错误(EINTR
除外)做出反应,也许有退避。我的猜测是,由于每个请求我只得到一个响应,而且响应很小,所以我什至永远不会看到 ENOBUFS
只要我一次只有一个请求在飞行中。
作为旁注,我们可以在 /proc/net/netlink 的 Drops
列中看到丢弃的网络链接数据包。例如:
# cat /proc/net/netlink
sk Eth Pid Groups Rmem Wmem Dump Locks Drops Inode
76f0e0ed 0 1966 00080551 0 0 0 2 0 36968
36a83ab1 0 1431 00000001 0 0 0 2 0 30297
d7d5db8e 0 563 00000440 0 0 0 2 0 19572
a10eb5c0 0 795 00000515 704 0 0 2 0 23584
c52bbce9 0 474 00000001 0 0 0 2 0 17511
3c5a89a5 0 989856248 00000001 0 0 0 2 0 31686
051108c1 0 0 00000000 0 0 0 2 0 25
ed401538 0 562 00000440 0 0 0 2 0 19576
38699987 0 469 00000557 0 0 0 2 0 19806
d7bbb203 0 728 00000113 0 0 0 2 0 22988
4d31126f 2 795 40000000 0 0 0 2 0 31092
febb9674 2 2100 00000001 0 0 0 2 0 37904
8c18eb5b 2 728 40000000 0 0 0 2 0 22989
922a7fcf 4 0 00000000 0 0 0 2 0 8681
16cfa740 7 0 00000000 0 0 0 2 0 7680
4e55a095 9 395 00000000 0 0 0 2 0 15142
0b2c5994 9 1 00000000 0 0 0 2 0 10840
94fe571b 9 0 00000000 0 0 0 2 0 7673
a7a1d82c 9 396 00000000 0 0 0 2 0 14484
b6a3f183 10 0 00000000 0 0 0 2 0 7517
1c2dc7e3 11 0 00000000 0 0 0 2 0 640
6dafb596 12 469 00000007 0 0 0 2 0 19810
2fe2c14c 12 3676872482 00000007 0 0 0 2 0 19811
3f245567 12 0 00000000 0 0 0 2 0 8682
0da2ddc4 12 3578344684 00000000 0 0 0 2 0 43683
f9720247 15 489 00000001 0 0 0 2 11 17781
e84b6d30 15 519 00000002 0 0 0 2 0 19071
c7d75154 15 1550 ffffffff 0 0 0 2 0 31970
02c1c3db 15 4070855316 00000001 0 0 0 2 0 10852
e0d7b09a 15 1 00000002 0 0 0 2 0 10821
78649432 15 0 00000000 0 0 0 2 0 30
8182eaf3 15 504 00000002 0 0 0 2 0 22047
40263df1 15 1858 ffffffff 0 0 0 2 0 34001
49283e31 16 0 00000000 0 0 0 2 0 696
我有兴趣将 netlink 用于简单的应用程序(以高频率读取 cgroup 统计信息)。
man page警告协议不可靠,暗示应用程序需要准备好处理丢弃的数据包:
However, reliable transmissions from kernel to user are impossible in any case. The kernel can't send a netlink message if the socket buffer is full: the message will be dropped and the kernel and the user-space process will no longer have the same view of kernel state. It is up to the application to detect when this happens (via the
ENOBUFS
error returned byrecvmsg(2)
) and resynchronize.
由于我的要求很简单,所以只要在发生任何意外情况时销毁套接字并创建一个新套接字就可以了。但是我找不到任何关于我的程序期望值的文档——例如 recvmsg(2)
的 man page 甚至没有提到 ENOBUFS
。
为了确保我可以判断来自我的应用程序的请求或来自内核的响应已被丢弃,以便我可以重置所有内容并重新开始,我需要担心什么?我很清楚,每当我从涉及的任何系统调用中收到错误时,我都可以这样做,但是例如,如果我的请求在通往内核的途中被丢弃会发生什么?我永远不会收到回复吗?我是否需要建立一个超时机制,让我只等待这么长时间的响应?
我在 Ayuso、Gasca 和 Lefevre 的 Communicating between the kernel and user-space in Linux using Netlink sockets 中找到了以下内容:
If Netlink fails to deliver a message that goes from kernel to user-space, the
recvmsg()
function returns the No buffer space available (ENOBUFS
) error. Thus, the user-space process knows that it is losing messages [...]On the other hand, buffer overruns cannot occur in communications from user to kernel-space since
sendmsg()
synchronously passes the Netlink message to the kernel subsystem. If blocking sockets are used, Netlink is completely reliable in communications from user to kernel-space since memory allocations would wait, so no memory exhaustion is possible.
关于acks,看起来担心它们是可选的:
NLM_F_ACK
: the user-space application requested a confirmation message from kernel-space to make sure that a given request was successfully performed. If this flag is not set, the kernel-space reports the error synchronously viasendmsg()
aserrno
value.
所以这听起来像是我的简单用例,我可以天真地使用 sendmsg
和 recvmsg
,通过重新开始整个事情来对任何错误(EINTR
除外)做出反应,也许有退避。我的猜测是,由于每个请求我只得到一个响应,而且响应很小,所以我什至永远不会看到 ENOBUFS
只要我一次只有一个请求在飞行中。
作为旁注,我们可以在 /proc/net/netlink 的 Drops
列中看到丢弃的网络链接数据包。例如:
# cat /proc/net/netlink
sk Eth Pid Groups Rmem Wmem Dump Locks Drops Inode
76f0e0ed 0 1966 00080551 0 0 0 2 0 36968
36a83ab1 0 1431 00000001 0 0 0 2 0 30297
d7d5db8e 0 563 00000440 0 0 0 2 0 19572
a10eb5c0 0 795 00000515 704 0 0 2 0 23584
c52bbce9 0 474 00000001 0 0 0 2 0 17511
3c5a89a5 0 989856248 00000001 0 0 0 2 0 31686
051108c1 0 0 00000000 0 0 0 2 0 25
ed401538 0 562 00000440 0 0 0 2 0 19576
38699987 0 469 00000557 0 0 0 2 0 19806
d7bbb203 0 728 00000113 0 0 0 2 0 22988
4d31126f 2 795 40000000 0 0 0 2 0 31092
febb9674 2 2100 00000001 0 0 0 2 0 37904
8c18eb5b 2 728 40000000 0 0 0 2 0 22989
922a7fcf 4 0 00000000 0 0 0 2 0 8681
16cfa740 7 0 00000000 0 0 0 2 0 7680
4e55a095 9 395 00000000 0 0 0 2 0 15142
0b2c5994 9 1 00000000 0 0 0 2 0 10840
94fe571b 9 0 00000000 0 0 0 2 0 7673
a7a1d82c 9 396 00000000 0 0 0 2 0 14484
b6a3f183 10 0 00000000 0 0 0 2 0 7517
1c2dc7e3 11 0 00000000 0 0 0 2 0 640
6dafb596 12 469 00000007 0 0 0 2 0 19810
2fe2c14c 12 3676872482 00000007 0 0 0 2 0 19811
3f245567 12 0 00000000 0 0 0 2 0 8682
0da2ddc4 12 3578344684 00000000 0 0 0 2 0 43683
f9720247 15 489 00000001 0 0 0 2 11 17781
e84b6d30 15 519 00000002 0 0 0 2 0 19071
c7d75154 15 1550 ffffffff 0 0 0 2 0 31970
02c1c3db 15 4070855316 00000001 0 0 0 2 0 10852
e0d7b09a 15 1 00000002 0 0 0 2 0 10821
78649432 15 0 00000000 0 0 0 2 0 30
8182eaf3 15 504 00000002 0 0 0 2 0 22047
40263df1 15 1858 ffffffff 0 0 0 2 0 34001
49283e31 16 0 00000000 0 0 0 2 0 696