信号量延迟比预期快 - 为什么?
Semaphore latency faster than expected - why?
信号量的获取是通过阻塞完成的。根据 internet 和 clockres,Windows 上的中断频率/定时器间隔不应低于 0.5ms。下面的程序测量了不同线程中信号量的释放和获取之间的时间。我不希望这比 0.5ms 快,但我可靠地得到了 ~0.017ms 的结果。 (奇怪的是标准差高达 +- 100%)
要么是我的测量代码有误,要么是我对信号量工作原理的理解不对。是哪个?没有计算均值和标准差的无聊代码的代码:
namespace {
std::binary_semaphore semaphore{ 0 };
std::atomic<std::chrono::high_resolution_clock::time_point> t1;
}
auto acquire_and_set_t1() {
semaphore.acquire(); // this is being measured
t1.store(std::chrono::high_resolution_clock::now());
}
auto measure_semaphore_latency() -> double {
std::jthread j(acquire_and_set_t1);
std::this_thread::sleep_for(5ms); // To make sure thread is running
// Signal thread and start timing
const auto t0 = std::chrono::high_resolution_clock::now();
semaphore.release();
std::this_thread::sleep_for(5ms); // To make sure thread is done writing t1
const double ms = std::chrono::duration_cast<std::chrono::nanoseconds>(t1.load() - t0).count() / 1'000'000.0;
return ms;
}
auto main() -> int {
std::vector<double> runtimes;
for (int i = 0; i < 100; ++i)
runtimes.emplace_back(measure_semaphore_latency());
const auto& [mean, relative_std] = get_mean_and_std(runtimes);
std::cout << std::format("mean: {:.3f} ms, +- {:.2f}%\n", mean, 100.0 * relative_std);
}
编辑:windows 计时器分辨率的来源是 https://randomascii.wordpress.com/2020/10/04/windows-timer-resolution-the-great-rule-change/ and ClockRes
你的困惑来自错误的假设,即它起作用了:
According to the internet and clockres, the interrupt frequency / timer interval on Windows shouldn't be under 0.5ms.
抢占式/基于计时器的调度不一定是 OS 将线程分配给 CPU 内核的唯一机会。 Explicit/Manual信令可以绕过
您可以将其视为 std::binary_semaphore::release()
触发调度程序的立即部分 运行,仅针对恰好在同一信号量上具有 std::binary_semaphore::acquire()
的线程。
这就是这里发生的事情。 measure_semaphore_latency()
线程正在被唤醒,并且 可能 在 release()
调用时立即分配给 CPU 核心,而无需等待下一次调度“循环”。
仍然不能保证 OS 会选择抢占刚刚唤醒的线程的任何内容。这就是您看到的高标准偏差的来源:线程要么立即获得 CPU 时间,要么在稍后的调度周期获得它,没有中间值。
至于为什么我可以如此确信你的测试就是这种情况:通过一些调试和符号加载,我们可以获得以下调用堆栈:
收购方:
ntdll.dll!00007fffa4510764() Unknown
ntdll.dll!00007fffa44d379d() Unknown
ntdll.dll!00007fffa44d3652() Unknown
ntdll.dll!00007fffa44d3363() Unknown
KernelBase.dll!00007fffa225ce9f() Unknown
> msvcp140d_atomic_wait.dll!`anonymous namespace'::__crtWaitOnAddress(volatile void * Address, void * CompareAddress, unsigned __int64 AddressSize, unsigned long dwMilliseconds) Line 174 C++
msvcp140d_atomic_wait.dll!__std_atomic_wait_direct(const void * _Storage, void * _Comparand, unsigned __int64 _Size, unsigned long _Remaining_timeout) Line 234 C++
ConsoleApplication2.exe!std::_Atomic_wait_direct<unsigned char,char>(const std::_Atomic_storage<unsigned char,1> * const _This, char _Expected_bytes, const std::memory_order _Order) Line 491 C++
ConsoleApplication2.exe!std::_Atomic_storage<unsigned char,1>::wait(const unsigned char _Expected, const std::memory_order _Order) Line 829 C++
ConsoleApplication2.exe!std::counting_semaphore<1>::acquire() Line 245 C++
ConsoleApplication2.exe!acquire_and_set_t1() Line 17 C++
ConsoleApplication2.exe!std::invoke<void (__cdecl*)(void)>(void(*)() && _Obj) Line 1586 C++
ConsoleApplication2.exe!std::thread::_Invoke<std::tuple<void (__cdecl*)(void)>,0>(void * _RawVals) Line 56 C++
ucrtbased.dll!00007fff4c7b542c() Unknown
kernel32.dll!00007fffa2857034() Unknown
ntdll.dll!00007fffa44c2651() Unknown
发布端:
ntdll.dll!00007fffa44d2550() Unknown
> msvcp140d_atomic_wait.dll!`anonymous namespace'::__crtWakeByAddressSingle(void * Address) Line 179 C++
msvcp140d_atomic_wait.dll!__std_atomic_notify_one_direct(const void * _Storage) Line 251 C++
ConsoleApplication2.exe!std::_Atomic_storage<unsigned char,1>::notify_one() Line 833 C++
ConsoleApplication2.exe!std::counting_semaphore<1>::release(const __int64 _Update) Line 232 C++
ConsoleApplication2.exe!measure_semaphore_latency() Line 29 C++
ConsoleApplication2.exe!main() Line 36 C++
ConsoleApplication2.exe!invoke_main() Line 79 C++
ConsoleApplication2.exe!__scrt_common_main_seh() Line 288 C++
ConsoleApplication2.exe!__scrt_common_main() Line 331 C++
ConsoleApplication2.exe!mainCRTStartup(void * __formal) Line 17 C++
kernel32.dll!00007fffa2857034() Unknown
ntdll.dll!00007fffa44c2651() Unknown
查看 __crtWakeByAddressSingle()
和 __crtWaitOnAddress()
(see on github) we find that the invoked kernel functions are WaitOnAddress()
ref and WakeByAddressSingle()
ref.
的代码
从该文档中,我们在 WaitOnAddress()
的备注部分找到我们的确认:
WaitOnAddress does not interfere with the thread scheduler.
信号量的获取是通过阻塞完成的。根据 internet 和 clockres,Windows 上的中断频率/定时器间隔不应低于 0.5ms。下面的程序测量了不同线程中信号量的释放和获取之间的时间。我不希望这比 0.5ms 快,但我可靠地得到了 ~0.017ms 的结果。 (奇怪的是标准差高达 +- 100%)
要么是我的测量代码有误,要么是我对信号量工作原理的理解不对。是哪个?没有计算均值和标准差的无聊代码的代码:
namespace {
std::binary_semaphore semaphore{ 0 };
std::atomic<std::chrono::high_resolution_clock::time_point> t1;
}
auto acquire_and_set_t1() {
semaphore.acquire(); // this is being measured
t1.store(std::chrono::high_resolution_clock::now());
}
auto measure_semaphore_latency() -> double {
std::jthread j(acquire_and_set_t1);
std::this_thread::sleep_for(5ms); // To make sure thread is running
// Signal thread and start timing
const auto t0 = std::chrono::high_resolution_clock::now();
semaphore.release();
std::this_thread::sleep_for(5ms); // To make sure thread is done writing t1
const double ms = std::chrono::duration_cast<std::chrono::nanoseconds>(t1.load() - t0).count() / 1'000'000.0;
return ms;
}
auto main() -> int {
std::vector<double> runtimes;
for (int i = 0; i < 100; ++i)
runtimes.emplace_back(measure_semaphore_latency());
const auto& [mean, relative_std] = get_mean_and_std(runtimes);
std::cout << std::format("mean: {:.3f} ms, +- {:.2f}%\n", mean, 100.0 * relative_std);
}
编辑:windows 计时器分辨率的来源是 https://randomascii.wordpress.com/2020/10/04/windows-timer-resolution-the-great-rule-change/ and ClockRes
你的困惑来自错误的假设,即它起作用了:
According to the internet and clockres, the interrupt frequency / timer interval on Windows shouldn't be under 0.5ms.
抢占式/基于计时器的调度不一定是 OS 将线程分配给 CPU 内核的唯一机会。 Explicit/Manual信令可以绕过
您可以将其视为 std::binary_semaphore::release()
触发调度程序的立即部分 运行,仅针对恰好在同一信号量上具有 std::binary_semaphore::acquire()
的线程。
这就是这里发生的事情。 measure_semaphore_latency()
线程正在被唤醒,并且 可能 在 release()
调用时立即分配给 CPU 核心,而无需等待下一次调度“循环”。
仍然不能保证 OS 会选择抢占刚刚唤醒的线程的任何内容。这就是您看到的高标准偏差的来源:线程要么立即获得 CPU 时间,要么在稍后的调度周期获得它,没有中间值。
至于为什么我可以如此确信你的测试就是这种情况:通过一些调试和符号加载,我们可以获得以下调用堆栈:
收购方:
ntdll.dll!00007fffa4510764() Unknown
ntdll.dll!00007fffa44d379d() Unknown
ntdll.dll!00007fffa44d3652() Unknown
ntdll.dll!00007fffa44d3363() Unknown
KernelBase.dll!00007fffa225ce9f() Unknown
> msvcp140d_atomic_wait.dll!`anonymous namespace'::__crtWaitOnAddress(volatile void * Address, void * CompareAddress, unsigned __int64 AddressSize, unsigned long dwMilliseconds) Line 174 C++
msvcp140d_atomic_wait.dll!__std_atomic_wait_direct(const void * _Storage, void * _Comparand, unsigned __int64 _Size, unsigned long _Remaining_timeout) Line 234 C++
ConsoleApplication2.exe!std::_Atomic_wait_direct<unsigned char,char>(const std::_Atomic_storage<unsigned char,1> * const _This, char _Expected_bytes, const std::memory_order _Order) Line 491 C++
ConsoleApplication2.exe!std::_Atomic_storage<unsigned char,1>::wait(const unsigned char _Expected, const std::memory_order _Order) Line 829 C++
ConsoleApplication2.exe!std::counting_semaphore<1>::acquire() Line 245 C++
ConsoleApplication2.exe!acquire_and_set_t1() Line 17 C++
ConsoleApplication2.exe!std::invoke<void (__cdecl*)(void)>(void(*)() && _Obj) Line 1586 C++
ConsoleApplication2.exe!std::thread::_Invoke<std::tuple<void (__cdecl*)(void)>,0>(void * _RawVals) Line 56 C++
ucrtbased.dll!00007fff4c7b542c() Unknown
kernel32.dll!00007fffa2857034() Unknown
ntdll.dll!00007fffa44c2651() Unknown
发布端:
ntdll.dll!00007fffa44d2550() Unknown
> msvcp140d_atomic_wait.dll!`anonymous namespace'::__crtWakeByAddressSingle(void * Address) Line 179 C++
msvcp140d_atomic_wait.dll!__std_atomic_notify_one_direct(const void * _Storage) Line 251 C++
ConsoleApplication2.exe!std::_Atomic_storage<unsigned char,1>::notify_one() Line 833 C++
ConsoleApplication2.exe!std::counting_semaphore<1>::release(const __int64 _Update) Line 232 C++
ConsoleApplication2.exe!measure_semaphore_latency() Line 29 C++
ConsoleApplication2.exe!main() Line 36 C++
ConsoleApplication2.exe!invoke_main() Line 79 C++
ConsoleApplication2.exe!__scrt_common_main_seh() Line 288 C++
ConsoleApplication2.exe!__scrt_common_main() Line 331 C++
ConsoleApplication2.exe!mainCRTStartup(void * __formal) Line 17 C++
kernel32.dll!00007fffa2857034() Unknown
ntdll.dll!00007fffa44c2651() Unknown
查看 __crtWakeByAddressSingle()
和 __crtWaitOnAddress()
(see on github) we find that the invoked kernel functions are WaitOnAddress()
ref and WakeByAddressSingle()
ref.
从该文档中,我们在 WaitOnAddress()
的备注部分找到我们的确认:
WaitOnAddress does not interfere with the thread scheduler.