IOCP 线程 - 说明?

IOCP threads - Clarification?

阅读 this article 后指出:

After a device finishes its job , (IO operation)- it notifies the CPU via interrupt.

... ... ...

However, that “completion” status only exists at the OS level; the process has its own memory space that must be notified

... ... ...

Since the library/BCL is using the standard P/Invoke overlapped I/O system, it has already registered the handle with the I/O Completion Port (IOCP), which is part of the thread pool.

... ... ...

So an I/O thread pool thread is borrowed briefly to execute the APC, which notifies the task that it’s complete.

我对粗体部分很感兴趣:

如果我没理解错的话,IO操作完成后,它必须通知执行IO操作的实际进程。

问题 #1:

是否意味着它为每个完成的IO操作抢线程池线程?还是为此专门设置线程数?

问题 #2:

正在查看:

for (int i=0;i<1000;i++)
    {
      PingAsync_NOT_AWAITED(i); //notice not awaited !
    }

这是否意味着我将同时拥有 1000 个 IOCP 线程池线程(某种程度上)运行,当所有线程都完成时?

这有点宽泛,所以让我谈谈要点:

IOCP 线程位于单独的线程池中,可以这么说 - 这就是 I/O 线程设置。因此它们不会与用户线程池线程发生冲突(就像您在正常 await 操作或 ThreadPool.QueueWorkerItem 中拥有的线程)。

就像普通的线程池一样,它只会随着时间的推移慢慢分配新的线程。因此,即使同时发生所有异步响应的峰值,您也不会有 1000 I/O 个线程。

在适当的异步应用程序中,您不会拥有超过内核数量的数量,就像工作线程一样。那是因为你要么正在做重要的 CPU 工作并且你将它放在一个普通的工作线程上 post 或者你正在做 I/O 工作并且你应该将其作为异步操作来完成。

想法是您在 I/O 回调中花费的时间很少 - 您不会阻塞,也不会做很多 CPU 工作。如果您违反了这一点(例如,将 Thread.Sleep(10000) 添加到您的回调中),那么是的,.NET 会随着时间的推移创建大量的 IO 线程 - 但那只是不正确的用法。

现在,I/O 线程与普通 CPU 线程有何不同?它们几乎相同,它们只是等待不同的信号 - 两者都是(简化警报)只是一个 while 循环方法,该方法在应用程序的其他部分排队新工作项时提供控制(或 OS)。主要区别在于 I/O 线程使用 IOCP 队列(OS 管理),而普通工作线程有自己的队列,完全由 .NET 管理并可由应用程序程序员访问。

附带说明一下,请不要忘记您的请求可能已同步完成。也许您正在 while 循环中读取 TCP 流,一次读取 512 个字节。如果套接字缓冲区中有足够的数据,多个 ReadAsync 可以 return 立即 而根本不进行任何线程切换。这通常不是问题,因为 I/O 往往是您在典型应用程序中所做的最耗时的事情,因此不必等待 I/O 通常很好。但是,依赖于异步发生的某些部分的错误代码(即使不能保证)很容易破坏您的应用程序。

Does it mean that it grabs a new thread pool thread for each completed IO operation ? Or is it a dedicated number of threads for this ?

为每个 I/O 请求创建一个新线程会非常低效,以至于达不到目的。相反,运行时从少量线程开始(确切数量取决于您的环境)并根据需要添加和删除工作线程(具体算法同样因您的环境而异)。 .NET 的每个主要版本都看到了此实现的变化,但基本思想保持不变:运行时尽最大努力仅创建和维护有效服务所有 I/O 所需的线程。在我的系统(Windows 8.1、.NET 4.5.2)上,一个全新的控制台应用程序在进入 Main 时进程中只有 3 个线程,并且这个数字在实际工作被请求之前不会增加。

Does it mean that I'll have 1000 IOCP threadpool thread simultaneously ( sort of) running here , when all are finished ?

没有。当您发出 I/O 请求时,线程将在完成端口上等待以获取结果并调用已注册的任何回调来处理结果(通过 BeginXXX 方法或作为一个任务)。如果您使用一个任务而不等待它,该任务就在那里结束,线程返回到线程池。

如果你真的等待它呢? 1000 I/O 个请求的结果不会真正同时到达,因为中断不会同时到达,但假设间隔比我们需要处理它们的时间短得多。在这种情况下,线程池将不断增加线程来处理结果,直到达到最大值,并且任何进一步的请求将最终在完成端口上排队。根据您的配置方式,这些线程可能需要一些时间才能启动。

考虑以下(故意糟糕的)玩具程序:

static void Main(string[] args) {
    printThreadCounts();
    var buffer = new byte[1024];
    const int requestCount = 30;
    int pendingRequestCount = requestCount;
    for (int i = 0; i != requestCount; ++i) {
        var stream = new FileStream(
            @"C:\Windows\win.ini",
            FileMode.Open, FileAccess.Read, FileShare.ReadWrite, 
            buffer.Length, FileOptions.Asynchronous
        );
        stream.BeginRead(
            buffer, 0, buffer.Length,
            delegate {
                Interlocked.Decrement(ref pendingRequestCount);
                Thread.Sleep(Timeout.Infinite);
            }, null
        );
    }
    do {
        printThreadCounts();
        Thread.Sleep(1000);
    } while (Thread.VolatileRead(ref pendingRequestCount) != 0);
    Console.WriteLine(new String('=', 40));
    printThreadCounts();
}

private static void printThreadCounts() {
    int completionPortThreads, maxCompletionPortThreads;
    int workerThreads, maxWorkerThreads;
    ThreadPool.GetMaxThreads(out maxWorkerThreads, out maxCompletionPortThreads);
    ThreadPool.GetAvailableThreads(out workerThreads, out completionPortThreads);
    Console.WriteLine(
        "Worker threads: {0}, Completion port threads: {1}, Total threads: {2}", 
        maxWorkerThreads - workerThreads, 
        maxCompletionPortThreads - completionPortThreads, 
        Process.GetCurrentProcess().Threads.Count
    );
}

在我的系统(有 8 个逻辑处理器)上,输出如下(结果可能因您的系统而异):

Worker threads: 0, Completion port threads: 0, Total threads: 3
Worker threads: 0, Completion port threads: 8, Total threads: 12
Worker threads: 0, Completion port threads: 9, Total threads: 13
Worker threads: 0, Completion port threads: 11, Total threads: 15
Worker threads: 0, Completion port threads: 13, Total threads: 17
Worker threads: 0, Completion port threads: 15, Total threads: 19
Worker threads: 0, Completion port threads: 17, Total threads: 21
Worker threads: 0, Completion port threads: 19, Total threads: 23
Worker threads: 0, Completion port threads: 21, Total threads: 25
Worker threads: 0, Completion port threads: 23, Total threads: 27
Worker threads: 0, Completion port threads: 25, Total threads: 29
Worker threads: 0, Completion port threads: 27, Total threads: 31
Worker threads: 0, Completion port threads: 29, Total threads: 33
========================================
Worker threads: 0, Completion port threads: 30, Total threads: 34

当我们发出 30 个异步请求时,线程池会迅速提供 8 个线程来处理结果,但之后它只会以每秒 2 个左右的悠闲速度启动新线程。这表明如果你想正确地利用系统资源,你最好确保你的 I/O 处理迅速完成。事实上,让我们将我们的委托更改为以下内容,它表示请求的 "proper" 处理:

stream.BeginRead(
    buffer, 0, buffer.Length,
    ar => {
        stream.EndRead(ar);
        Interlocked.Decrement(ref pendingRequestCount);
    }, null
);

结果:

Worker threads: 0, Completion port threads: 0, Total threads: 3
Worker threads: 0, Completion port threads: 1, Total threads: 11
========================================
Worker threads: 0, Completion port threads: 0, Total threads: 11

同样,结果可能因您的系统和运行而异。在这里,我们几乎看不到正在运行的完成端口线程,而我们发出的 30 个请求在没有启动新线程的情况下完成。您应该发现您可以将“30”更改为“100”甚至“100000”:我们的循环不能比请求完成更快地开始请求。但是请注意,结果对我们有利,因为 "I/O" 一遍又一遍地读取相同的字节,并且将从操作系统缓存中获取服务,而不是从磁盘中读取。当然,这并不是为了展示实际的吞吐量,只是开销上的差异。

要使用工作线程而不是完成端口线程重复这些结果,只需将 FileOptions.Asynchronous 更改为 FileOptions.None。这使得文件访问同步,异步操作将在工作线程上完成,而不是使用完成端口:

Worker threads: 0, Completion port threads: 0, Total threads: 3
Worker threads: 8, Completion port threads: 0, Total threads: 15
Worker threads: 9, Completion port threads: 0, Total threads: 16
Worker threads: 10, Completion port threads: 0, Total threads: 17
Worker threads: 11, Completion port threads: 0, Total threads: 18
Worker threads: 12, Completion port threads: 0, Total threads: 19
Worker threads: 13, Completion port threads: 0, Total threads: 20
Worker threads: 14, Completion port threads: 0, Total threads: 21
Worker threads: 15, Completion port threads: 0, Total threads: 22
Worker threads: 16, Completion port threads: 0, Total threads: 23
Worker threads: 17, Completion port threads: 0, Total threads: 24
Worker threads: 18, Completion port threads: 0, Total threads: 25
Worker threads: 19, Completion port threads: 0, Total threads: 26
Worker threads: 20, Completion port threads: 0, Total threads: 27
Worker threads: 21, Completion port threads: 0, Total threads: 28
Worker threads: 22, Completion port threads: 0, Total threads: 29
Worker threads: 23, Completion port threads: 0, Total threads: 30
Worker threads: 24, Completion port threads: 0, Total threads: 31
Worker threads: 25, Completion port threads: 0, Total threads: 32
Worker threads: 26, Completion port threads: 0, Total threads: 33
Worker threads: 27, Completion port threads: 0, Total threads: 34
Worker threads: 28, Completion port threads: 0, Total threads: 35
Worker threads: 29, Completion port threads: 0, Total threads: 36
========================================
Worker threads: 30, Completion port threads: 0, Total threads: 37

线程池每秒启动一个工作线程,而不是它为完成端口线程启动的两个。显然,这些数字是依赖于实现的,并且可能会在新版本中发生变化。

最后,让我们演示如何使用 ThreadPool.SetMinThreads 来确保最少数量的线程可用于完成请求。如果我们回到 FileOptions.Asynchronous 并将 ThreadPool.SetMinThreads(50, 50) 添加到我们的玩具程序的 Main 中,结果是:

Worker threads: 0, Completion port threads: 0, Total threads: 3
Worker threads: 0, Completion port threads: 31, Total threads: 35
========================================
Worker threads: 0, Completion port threads: 30, Total threads: 35

现在,线程池不再耐心地每两秒添加一个线程,而是不断增加线程直到达到最大值(在这种情况下不会发生这种情况,因此最终计数保持在 30)。当然,所有这 30 个线程都处于无限等待状态——但如果这是一个真实的系统,那么这 30 个线程现在可能正在做有用的工作,即使不是非常高效的工作。不过,我不会尝试 this 100000 个请求。

Does it mean that I'll have 1000 IOCP threadpool thread simultaneously ( sort of) running here , when all are finished ?

不,一点也不。与 ThreadPool 中可用的工作线程一样,我们也有 "Completion port threads".

这些线程专用于异步 I/O。不会预先创建线程。它们是按需创建的,与工作线程的创建方式相同。当线程池决定时,它们最终将被销毁。

通过 简要借用 作者的意思是使用 "Completion port threads"(ThreadPool 的)中的某个任意线程来通知进程 IO 的完成。它不会执行任何冗长的操作,而是完成 IO 通知。

正如我们之前所说,IOCP 和工作线程在线程池中有一个单独的资源。

无论您是否await IO 操作,都会发生向 IOCP 或重叠 IO 的注册。 await是更高级的机制,与那些IOCP的注册无关。

通过简单的测试,可以看到虽然没有await发生,但是应用程序仍然在使用IOCP:

private static void Main(string[] args)
{
    Task.Run(() =>
    {
        int count = 0;
        while (count < 30)
        {
            int _;
            int iocpThreads;
            ThreadPool.GetAvailableThreads(out _, out iocpThreads);
            Console.WriteLine("Current number of IOCP threads availiable: {0}", iocpThreads);
            count++;
            Thread.Sleep(10);
        }
    });

    for (int i = 0; i < 30; i++)
    {
        GetUrl(@"http://www.ynet.co.il");
    }

    Console.ReadKey();
}

private static async Task<string> GetUrl(string url)
{
    var httpClient = new HttpClient();
    var response = await httpClient.GetAsync(url);
    return await response.Content.ReadAsStringAsync();
}

根据执行每个请求所花费的时间,您会在发出请求时看到 IOCP 缩小。您尝试创建的并发请求越多,可供您使用的线程就越少。