如何在 epoll 上使用具有级别触发行为的 eventfd?

How to use an eventfd with level triggered behaviour on epoll?

epoll_ctl 上注册一个级别触发的 eventfd 仅在不递减 eventfd 计数器时触发一次。总结一下这个问题,我观察到 epoll 标志(EPOLLETEPOLLONESHOTNone 用于级别触发行为)表现相似。或者换句话说:没有效果。

你能确认这个错误吗?

我有一个多线程应用程序。每个线程都用相同的 epollfd 等待 epoll_wait 的新事件。如果要优雅地终止应用程序,则必须唤醒所有线程。我的想法是你为此使用 eventfd 计数器(EFD_SEMAPHORE|EFD_NONBLOCK)(具有级别触发的 epoll 行为)一起唤醒。 (不考虑少数文件描述符的雷群问题。)

例如对于 4 个线程,您将 4 写入 eventfd。我期待 epoll_wait returns 一次又一次,直到计数器递减(读取)4 次。 epoll_wait 每次写入仅 returns 一次。

是的,我仔细阅读了所有相关手册;)

#include <sys/epoll.h>
#include <sys/eventfd.h>
#include <sys/types.h>
#include <unistd.h>
#include <pthread.h>

static int event_fd = -1;
static int epoll_fd = -1;

void *thread(void *arg)
{
    (void) arg;

    for(;;) {
       struct epoll_event event;
       epoll_wait(epoll_fd, &event, 1, -1);

       /* handle events */
       if(event.data.fd == event_fd && event.events & EPOLLIN) {
           uint64_t val = 0;
           eventfd_read(event_fd, &val);
           break;
       }
    }

    return NULL;
}

int main(void)
{
    epoll_fd = epoll_create1(0);
    event_fd = eventfd(0, EFD_SEMAPHORE| EFD_NONBLOCK);

    struct epoll_event event;
    event.events = EPOLLIN;
    event.data.fd = event_fd;
    epoll_ctl(epoll_fd, EPOLL_CTL_ADD, event_fd, &event);

    enum { THREADS = 4 };
    pthread_t thrd[THREADS];

    for (int i = 0; i < THREADS; i++)
        pthread_create(&thrd[i], NULL, &thread, NULL);

    /* let threads park internally (kernel does readiness check before sleeping) */
    usleep(100000);
    eventfd_write(event_fd, THREADS);

    for (int i = 0; i < THREADS; i++)
        pthread_join(thrd[i], NULL);
}

当您写入 eventfd 时,将调用函数 eventfd_signal。它包含以下唤醒功能的行:

wake_up_locked_poll(&ctx->wqh, EPOLLIN);

wake_up_locked_poll 是一个宏:

#define wake_up_locked_poll(x, m)                       \
    __wake_up_locked_key((x), TASK_NORMAL, poll_to_key(m))

__wake_up_locked_key 定义为:

void __wake_up_locked_key(struct wait_queue_head *wq_head, unsigned int mode, void *key)
{
    __wake_up_common(wq_head, mode, 1, 0, key, NULL);
}

最后,__wake_up_common 声明为:

/*
 * The core wakeup function. Non-exclusive wakeups (nr_exclusive == 0) just
 * wake everything up. If it's an exclusive wakeup (nr_exclusive == small +ve
 * number) then we wake all the non-exclusive tasks and one exclusive task.
 *
 * There are circumstances in which we can try to wake a task which has already
 * started to run but is not in state TASK_RUNNING. try_to_wake_up() returns
 * zero in this (rare) case, and we handle it by continuing to scan the queue.
 */
static int __wake_up_common(struct wait_queue_head *wq_head, unsigned int mode,
            int nr_exclusive, int wake_flags, void *key,
            wait_queue_entry_t *bookmark)

注意 nr_exclusive 参数,您会看到写入 eventfd 只会唤醒一个独占服务员。

独家是什么意思?阅读 epoll_ctl 手册页给我们一些见解:

EPOLLEXCLUSIVE (since Linux 4.5):

Sets an exclusive wakeup mode for the epoll file descriptor that is being attached to the target file descriptor, fd. When a wakeup event occurs and multiple epoll file descriptors are attached to the same target file using EPOLLEXCLUSIVE, one or more of the epoll file descriptors will receive an event with epoll_wait(2).

添加事件时不使用 EPOLLEXCLUSIVE,但要使用 epoll_wait 等待,每个线程都必须将自己放入等待队列。函数 do_epoll_wait performs the wait by calling ep_poll. By following the code you can see that it adds the current thread to a wait queue at line #1903:

__add_wait_queue_exclusive(&ep->wq, &wait);

这是对正在发生的事情的解释 - epoll 服务员是 独占的,所以只有一个线程被唤醒。此行为已在 v2.6.22-rc1 and the relevant change has been discussed here.

中引入

对我来说,这看起来像是 eventfd_signal 函数中的错误:在信号量模式下,它应该执行唤醒 nr_exclusive 等于写入的值。

所以你的选择是:

  • 为每个线程创建一个单独的 epoll 描述符(可能不适用于您的设计 - 缩放问题)
  • 在它周围放置一个互斥量(缩放问题)
  • 使用poll,可能在eventfd和epoll
  • 通过将 1 写入 evenfd_write 4 次(可能是您能做的最好的)来分别唤醒每个线程。