none 如何在 Linux 上的 .Net 中实现常规文件的阻塞 IO?

How is none blocking IO for regular files is implemented in .Net on Linux?

据我所知,常规文件上的所有 IO 总是在 Linux (see here) 中阻塞。 但是,您仍然可以执行 File.ReadBLAHAsync(...)/File.WriteBLAHAsync(...) 或其他与文件相关的操作。

这些包装器伪造 async 调用只是为了保持向后兼容,还是为了满足同步上下文?

isAsync 允许您控制文件是异步打开还是同步打开I/O。默认值为false,表示同步I/O。如果您为同步 I/O 打开一个 FileStream,但稍后使用它的任何 *Async() 方法,它们将在 ThreadPool 上执行同步 I/O(不支持取消),这可能无法扩展如果 FileStream 为异步打开 I/O.

using System;
using System.IO;
using System.Threading;

namespace NonBlock
{
    class ReadWrite
    {
        static async void Begin(FileStream s)
        {
            Console.WriteLine();

            try
            {
                byte[] buffer = new byte[4096];
                while (true)
                {
                    var read = await s.ReadAsync(buffer, 0, 4096);
                    Console.WriteLine($"Read {read} bytes");
                }
            }
            catch (Exception ex)
            {
                Console.WriteLine(ex);
            }
        }

        static void Main(string[] args)
        {
            Console.WriteLine();
            var fs = new FileStream(
                "/proc/self/fd/0",
                FileMode.Open,
                FileAccess.Read,
                FileShare.None,
                4096, useAsync: true);

            Begin(fs);

            Thread.Sleep(5000);
            fs.Dispose();
            Console.WriteLine();
            Thread.Sleep(-1);
        }
    }
}

值得指出的是,这里有多个上下文在起作用。

Linux操作系统

来自 Non-Blocking descriptors:

By default, read on any descriptor blocks if there’s no data available. The same applies to write or send. This applies to operations on most descriptors except disk files, since writes to disk never happen directly but via the kernel buffer cache as a proxy. The only time when writes to disk happen synchronously is when the O_SYNC flag was specified when opening the disk file.

Any descriptor (pipes, FIFOs, sockets, terminals, pseudo-terminals, and some other types of devices) can be put in the nonblocking mode. When a descriptor is set in nonblocking mode, an I/O system call on that descriptor will return immediately, even if that request can’t be immediately completed (and will therefore result in the process being blocked otherwise). The return value can be either of the following:

  • an error: when the operation cannot be completed at all
  • a partial count: when the input or output operation can be partially completed
  • the entire result: when the I/O operation could be fully completed

如上所述,Non-Blocking 描述符将防止管道(或套接字,或...)连续阻塞。但是,它们并不是要与磁盘文件一起使用,因为无论您想读取整个文件还是其中的一部分,数据都在那里。以后用不到了,你可以马上开始处理。

引用您的linked post

Regular files are always readable and they are also always writeable. This is clearly stated in the relevant POSIX specifications. I cannot stress this enough. Putting a regular file in non-blocking has ABSOLUTELY no effects other than changing one bit in the file flags.

Reading from a regular file might take a long time. For instance, if it is located on a busy disk, the I/O scheduler might take so much time that the user will notice that the application is frozen.

Nevertheless, non-blocking mode will not fix it. It will simply not work. Checking a file for readability or writeability always succeeds immediately. If the system needs time to perform the I/O operation, it will put the task in non-interruptible sleep from the read or write system call. In other words, if you can assume that a file descriptor refers to a regular file, do not waste your time (or worse, other people's time) in implementing non-blocking I/O.

The only safe way to read data from or write data to a regular file while not blocking a task... consists of not performing the operation, not in that particular task anyway. Concretely, you need to create a separate thread (or process), or use asynchronous I/O (functions whose name starts with aio_). Whether you like it or not, and even if you think multiple threads suck, there are no other options.

.NET 运行时

执行 async/await 模式以在执行 I/O 时解锁主事件循环。如上所述:

Concretely, you need to create a separate thread (or process), or use asynchronous I/O (functions whose name starts with aio_). Whether you like it or not, and even if you think multiple threads suck, there are no other options.

.NET 线程池将根据需要生成其他进程(参考 )。因此,理想情况下,当调用 .NET File.ReadAsync(...)File.WriteAsync(...) 重载时,当前线程(来自线程池)将启动 I/O 操作,然后放弃控制,将其释放到做其他工作。但在此之前,对 I/O 操作进行了延续。因此,当 I/O 设备发出操作已完成的信号时,线程池调度程序知道下一个空闲线程可以继续执行。

可以肯定的是,这全都与响应能力有关。所有需要 I/O 完成的代码仍然需要等待。虽然,它不会“阻止”应用程序。

返回OS

线程放弃控制,最终导致被释放,可以在Windows:

上实现

https://docs.microsoft.com/en-us/troubleshoot/windows/win32/asynchronous-disk-io-synchronous

异步 I/O 还没有成为 Linux 的一部分(很长时间),我们这里的流程描述于:

https://devblogs.microsoft.com/dotnet/file-io-improvements-in-dotnet-6/#unix

Unix-like systems don’t expose async file IO APIs (except of the new io_uring which we talk about later). Anytime user asks FileStream to perform async file IO operation, a synchronous IO operation is being scheduled to Thread Pool. Once it’s dequeued, the blocking operation is performed on a dedicated thread.

Python's asyncio implementation 建议使用类似流程:

asyncio does not support asynchronous operations on the filesystem. Even if files are opened with O_NONBLOCK, read and write will block.

...

The Linux kernel provides asynchronous operations on the filesystem (aio), but it requires a library and it doesn't scale with many concurrent operations. See aio.

...

For now, the workaround is to use aiofiles that uses threads to handle files.

结束语

Linux' Non-Blocking 描述符(及其轮询机制)背后的概念并不是 async I/O 在 Windows 上打勾的原因。

如@Damien_The_Unbeliever 所述,有一个相对较新的 Linux 内核接口允许类似于 Windows 上的继续流程。但是,以下链接确认这尚未在 .NET6 上实现: