为什么 sys_futex 上的 kretprobe 调用频率低于相应的 kprobe?
Why is a kretprobe on sys_futex called less often than a corresponding kprobe?
我正在跟踪各种内核函数和系统调用,并在它们之间建立可用于某些性能分析的模式。
我注意到的一件事是,有时,即使在我的简单测试应用程序中,它启动了一些使用一些互斥锁的线程,我也不会收到任何对 kretprobe__sys_futex
的调用,但我会得到kprobe__sys_futex
.
很多
我认为这是因为例如一个线程正在调用 sys_futex
,然后进入休眠状态或可能终止,但我实际上看到相同的进程连续多次调用 sys_futex
,而 return 探测器从未注意到任何事情。
然后我假设问题出在我如何过滤对 kprobe__sys_futex
的调用,所以我使用 BCC/eBPF 做了一个最小的例子来测试这个:
#! /usr/bin/env python
from bcc import BPF
b = BPF(text="""
BPF_HASH(call_count, int, int);
int kprobe__sys_futex() {
int zero = 0;
call_count.lookup_or_init(&zero, &zero);
bpf_trace_printk("futex start\n");
call_count.increment(zero);
return 0;
}
int kretprobe__sys_futex() {
int zero = 0;
int *val_p = call_count.lookup(&zero);
if (val_p != NULL) {
int val = *val_p;
val--;
call_count.update(&zero, &val);
bpf_trace_printk("futex calls with no return: %d\n", val);
} else { bpf_trace_printk("unexpected futex return\n"); }
return 0;
}
""")
b.trace_print()
我注意到在各种应用程序中(一个很好的例子是 mysql-server,即使在空闲时它也会定期进行 futex 操作 - 至少在我的机器上),很多(通常是 10+)futex start
会在来自 return 探测器的消息之前打印。
这是我在 运行 编写此 post:
时离开 运行 几分钟的上述程序的示例跟踪
... hundreds of lines of much the same as below
gdbus-612 [001] .... 211229.997665: 0x00000001: futex start
NetworkManager-541 [001] .... 211229.997667: 0x00000001: futex start
gdbus-612 [001] .... 211229.997670: 0x00000001: futex start
mysqld-697 [001] .... 211230.789205: 0x00000001: futex start
mysqld-697 [001] .... 211230.789227: 0x00000001: futex start
mysqld-703 [001] .... 211230.789251: 0x00000001: futex start
mysqld-703 [001] .... 211230.789253: 0x00000001: futex start
mysqld-704 [001] d... 211230.789258: 0x00000001: futex calls with no return: 3994
mysqld-704 [001] .... 211230.789259: 0x00000001: futex start
mysqld-704 [001] d... 211230.789260: 0x00000001: futex calls with no return: 3994
mysqld-704 [001] .... 211230.789272: 0x00000001: futex start
mysqld-713 [000] .... 211231.037016: 0x00000001: futex start
mysqld-713 [000] .... 211231.037036: 0x00000001: futex start
vmstats-895 [000] .... 211231.464867: 0x00000001: futex start
mysqld-697 [001] .... 211231.790738: 0x00000001: futex start
mysqld-697 [001] .... 211231.790784: 0x00000001: futex start
mysqld-703 [001] .... 211231.790796: 0x00000001: futex start
mysqld-703 [001] .... 211231.790799: 0x00000001: futex start
mysqld-704 [001] d... 211231.790809: 0x00000001: futex calls with no return: 4001
mysqld-704 [001] .... 211231.790812: 0x00000001: futex start
mysqld-704 [001] d... 211231.790814: 0x00000001: futex calls with no return: 4001
如您所见,例如,pid 697 似乎已经调用了 sys_futex
四次而没有 returning 就在这个小跟踪中。
我认为这不是 eBPF 代码中的竞争条件,因为如果您将打印语句设为静音并且只定期打印,对于 sys_write
,计数通常会限制在零附近,这会发生数量级比 sys_futex
更频繁(至少在我的系统的工作负载上),所以我预计任何竞争条件都会加剧而不是解决。
我是 运行 内核 4.15.0-43-generic,在 Ubuntu 18.04 LTS 上,它位于 VirtualBox 中。
很高兴提供更多可能有用的上下文!
IOVisor 邮件列表中有一个相关主题:https://lists.iovisor.org/g/iovisor-dev/topic/29702757
这是密件抄送的已知限制(参见 iovisor/bcc#1072)。基本上,对于您的跟踪上下文,活动探测器的最大数量设置得太低,因此您缺少一些 return 探测器。
在 bcc 中,maxactive
值(活动探测的最大数量,请参阅下面的文档摘录)保留为其默认值。由于 Alban Crequy 对 Linux 内核的补丁(参见 iovisor/bcc#1072),通过 debugfs 附加探测器时可以更改 maxactive
值。不过,新的 API 尚未通过密件抄送公开。本周我会尝试发送一个补丁来达到这个效果。
While the probed function is executing, its return address is
stored in an object of type kretprobe_instance. Before calling
register_kretprobe(), the user sets the maxactive field of the
kretprobe struct to specify how many instances of the specified
function can be probed simultaneously. register_kretprobe()
pre-allocates the indicated number of kretprobe_instance objects.
For example, if the function is non-recursive and is called with a
spinlock held, maxactive = 1 should be enough. If the function is
non-recursive and can never relinquish the CPU (e.g., via a semaphore
or preemption), NR_CPUS should be enough. If maxactive <= 0, it is
set to a default value. If CONFIG_PREEMPT is enabled, the default
is max(10, 2*NR_CPUS). Otherwise, the default is NR_CPUS.
我正在跟踪各种内核函数和系统调用,并在它们之间建立可用于某些性能分析的模式。
我注意到的一件事是,有时,即使在我的简单测试应用程序中,它启动了一些使用一些互斥锁的线程,我也不会收到任何对 kretprobe__sys_futex
的调用,但我会得到kprobe__sys_futex
.
我认为这是因为例如一个线程正在调用 sys_futex
,然后进入休眠状态或可能终止,但我实际上看到相同的进程连续多次调用 sys_futex
,而 return 探测器从未注意到任何事情。
然后我假设问题出在我如何过滤对 kprobe__sys_futex
的调用,所以我使用 BCC/eBPF 做了一个最小的例子来测试这个:
#! /usr/bin/env python
from bcc import BPF
b = BPF(text="""
BPF_HASH(call_count, int, int);
int kprobe__sys_futex() {
int zero = 0;
call_count.lookup_or_init(&zero, &zero);
bpf_trace_printk("futex start\n");
call_count.increment(zero);
return 0;
}
int kretprobe__sys_futex() {
int zero = 0;
int *val_p = call_count.lookup(&zero);
if (val_p != NULL) {
int val = *val_p;
val--;
call_count.update(&zero, &val);
bpf_trace_printk("futex calls with no return: %d\n", val);
} else { bpf_trace_printk("unexpected futex return\n"); }
return 0;
}
""")
b.trace_print()
我注意到在各种应用程序中(一个很好的例子是 mysql-server,即使在空闲时它也会定期进行 futex 操作 - 至少在我的机器上),很多(通常是 10+)futex start
会在来自 return 探测器的消息之前打印。
这是我在 运行 编写此 post:
时离开 运行 几分钟的上述程序的示例跟踪... hundreds of lines of much the same as below
gdbus-612 [001] .... 211229.997665: 0x00000001: futex start
NetworkManager-541 [001] .... 211229.997667: 0x00000001: futex start
gdbus-612 [001] .... 211229.997670: 0x00000001: futex start
mysqld-697 [001] .... 211230.789205: 0x00000001: futex start
mysqld-697 [001] .... 211230.789227: 0x00000001: futex start
mysqld-703 [001] .... 211230.789251: 0x00000001: futex start
mysqld-703 [001] .... 211230.789253: 0x00000001: futex start
mysqld-704 [001] d... 211230.789258: 0x00000001: futex calls with no return: 3994
mysqld-704 [001] .... 211230.789259: 0x00000001: futex start
mysqld-704 [001] d... 211230.789260: 0x00000001: futex calls with no return: 3994
mysqld-704 [001] .... 211230.789272: 0x00000001: futex start
mysqld-713 [000] .... 211231.037016: 0x00000001: futex start
mysqld-713 [000] .... 211231.037036: 0x00000001: futex start
vmstats-895 [000] .... 211231.464867: 0x00000001: futex start
mysqld-697 [001] .... 211231.790738: 0x00000001: futex start
mysqld-697 [001] .... 211231.790784: 0x00000001: futex start
mysqld-703 [001] .... 211231.790796: 0x00000001: futex start
mysqld-703 [001] .... 211231.790799: 0x00000001: futex start
mysqld-704 [001] d... 211231.790809: 0x00000001: futex calls with no return: 4001
mysqld-704 [001] .... 211231.790812: 0x00000001: futex start
mysqld-704 [001] d... 211231.790814: 0x00000001: futex calls with no return: 4001
如您所见,例如,pid 697 似乎已经调用了 sys_futex
四次而没有 returning 就在这个小跟踪中。
我认为这不是 eBPF 代码中的竞争条件,因为如果您将打印语句设为静音并且只定期打印,对于 sys_write
,计数通常会限制在零附近,这会发生数量级比 sys_futex
更频繁(至少在我的系统的工作负载上),所以我预计任何竞争条件都会加剧而不是解决。
我是 运行 内核 4.15.0-43-generic,在 Ubuntu 18.04 LTS 上,它位于 VirtualBox 中。
很高兴提供更多可能有用的上下文!
IOVisor 邮件列表中有一个相关主题:https://lists.iovisor.org/g/iovisor-dev/topic/29702757
这是密件抄送的已知限制(参见 iovisor/bcc#1072)。基本上,对于您的跟踪上下文,活动探测器的最大数量设置得太低,因此您缺少一些 return 探测器。
在 bcc 中,maxactive
值(活动探测的最大数量,请参阅下面的文档摘录)保留为其默认值。由于 Alban Crequy 对 Linux 内核的补丁(参见 iovisor/bcc#1072),通过 debugfs 附加探测器时可以更改 maxactive
值。不过,新的 API 尚未通过密件抄送公开。本周我会尝试发送一个补丁来达到这个效果。
While the probed function is executing, its return address is stored in an object of type kretprobe_instance. Before calling register_kretprobe(), the user sets the maxactive field of the kretprobe struct to specify how many instances of the specified function can be probed simultaneously. register_kretprobe() pre-allocates the indicated number of kretprobe_instance objects.
For example, if the function is non-recursive and is called with a spinlock held, maxactive = 1 should be enough. If the function is non-recursive and can never relinquish the CPU (e.g., via a semaphore or preemption), NR_CPUS should be enough. If maxactive <= 0, it is set to a default value. If CONFIG_PREEMPT is enabled, the default is max(10, 2*NR_CPUS). Otherwise, the default is NR_CPUS.