在写入同一文件时,本机 OS 操作是否在每个进程的基础上进行了优化?

Are native OS operations optimized on a per-process basis while writing same file?

我试图了解通过 C# 中的托管代码完成的文件写入操作的一些内部细节。假设我在下面写了一段代码,将一些内容写入日志文件:

using System.IO;
public void Log(string logMessage, LogLevel logLevel)
{
    if (Directory.Exists(LogFileDirectory))
    {
        using (StreamWriter streamWriter = new StreamWriter(LogFileDirectory + "log.txt",true))
        {
            streamWriter.WriteLine(DateTime.Now.ToString() + " - " + logMessage);                                    
        }
    }
}

要将文件写入磁盘,我可以列出一些必须发生的事情:

我想了解的事情或内部细节是,是否为我在应用程序中创建的 StreamWriter class 的每个新实例重复上述步骤,或者几乎没有什么如果每次都是同一个进程要求向完全相同的文件写入内容,哪个操作系统(OS)或进程可以优化或缓存?

.NET 类 只是冰山一角,是本机 Win32 函数之上的一个非常薄的层。当您穿过 managed 层时,会发生很多事情。

简化视图:

    .NET layer (managed)
-----------------------------
    Win32 layer (User Mode)
-----------------------------
    Drivers (Kernel Mode)

对于实际磁盘 I/O,您可以将 "Win32 layer" 视为 medium-thick 层。实际磁盘I/O、检查文件是否存在、访问权限等不要发生在这一层。忘记物理磁盘移动。该层与内核(构成设备驱动程序、file-system 驱动程序、IO 管理器和其他系统组件)协调,将检查句柄是否有效,传递的参数是否有效,执行的操作是否正确(如"write" 对使用 "read-only" 打开的文件的操作将被拒绝)。如果您在 dotPeek 或任何其他类似工具的反编译器中看到 .NET 类 的代码,您会看到调用 Win32 API 的托管代码。

实际磁盘 I/O 由内核模式组件执行。这是DiskI/O(或NetworkI/O)最厚的一层,核心,触手。无论是访问物理disk-drive、执行安全检查、处理异步IO、启动APC/DPC、调用driver-stack中的其他驱动(file-system驱动、mini-filter驱动,监视器等),确保在进程退出时关闭所有文件句柄(应用程序不会明确关闭)。 Anti-virus 组件会在这个级别 运行 ,它们会记录文件 I/O 操作,阻止操作,甚至完全修改操作。大多数磁盘 I/O,在裸机 OS 上,将由 Microsoft 提供的驱动程序执行。 Anti-virus、特定的 device-driver(例如您最喜欢的 hard-disk)、其他监控驱动程序(例如 Process Monitor 或 WireShark 使用的)和其他驱动程序可能会安装并参与磁盘 I/O 用户模式/.NET 应用程序发出的请求。

大多数驱动程序在启动时或按需加载。调用文件 open/close/read/write 时不会加载它们。

Windows是一个复杂而庞大的操作系统。许多 I/O 元素散布在内核世界(大部分)和 Win32 世界(sub-system DLL)中。您不能说 "file-permission" 仅由内核或用户执行 - 它是两者的结合。缓存、memory-manager、存储管理器和许多其他 "lower" User/Kernel 组件为用户应用程序做这件事。

不同版本的 Windows 会做不同的事情。

您不能说 Kernel IO 是最快的,而 .NET IO 是最慢的。这几乎是一样的。尽管访问内核(从用户(包括 .NET))将花费一些 CPU 周期,因此应用程序应该理想地减少 IO 调用,例如读取 10K 字节,而不是 10 字节 1000 次。

最后,我只想说 - 你不应该在意!

我首先要问为什么这很重要,即使它不是在本机 c# 中,您也无法更改它,因为从技术上讲,调用存在于代码的可编辑部分之外system.dll 的一部分,由 os 通过管道解释。它在 OS 级别的解释方式无关紧要,只要知道它有效即可。当您进入 xamarin mono 时,OS 如何处理 acros 全部 os 接管并向 os 发送基本读取或写入命令以在其堆栈上进行处理因为这样做是免费的(有时这可能需要一段时间,具体取决于移动领域的繁忙程度)。你不能轻松地加速或优化它(你基本上是在构建一种新语言,没有人建议采用它。)

正如 ajay 上面所说,最后你不应该真正关心,因为在长 运行 中你无法真正改变它的行为。

您可以在此处阅读一些 material 以了解 I/O 的一般工作原理和优化尝试:

https://www.wilderssecurity.com/threads/trying-to-understand-i-o-nomenclature.286453/

https://docs.microsoft.com/en-us/dotnet/standard/io/

http://enterprisecraftsmanship.com/2014/12/13/io-threads-explained/

希望这可以帮助您了解更多以及您正在寻找什么(我仍然不确定您想知道什么或为什么。您最好联系 .net 核心团队以获取更多信息)

根据你的问题理解,可以分为以下几种。

  1. The internal detail I'm trying to understand is, do the above steps get repeated for each new instance of StreamWriter class that I create in the application

  2. Are there things which the operating system (OS) or process can optimize or cache if every time it is same process who is asking to write something to the exactly same file

  3. Your special bounty requirement "I also want to understand if OS applies some extra intelligence when the file read/write requests are coming from same process or different processes? Or does OS remains agnostic of the process who is requesting the read/write operation".


#回答第一个问题

Disclaimer : The below is only relevant to the actual code you have written (as is). If it’s changed slightly a lot of the implementation details become irrelevant.

实际上,每次您创建 StreamWriter 时,您上面描述的内容很少会重复(或什至完成)。然而,事情确实会发生。

让我们通过 .Net Source 创造一个 StreamWriter 你拥有的方式。

正在创建 StreamWrtier

using (StreamWriter streamWriter = new StreamWriter(LogFileDirectory + "log.txt",true))

调用链如下

  1. public StreamWriter(String path, bool append)

Initializes a new instance of the StreamWriter class for the specified file by using the default encoding and buffer size. If the file exists, it can be either overwritten or appended to. If the file does not exist, this constructor creates a new file.

  1. public StreamWriter(String path, bool append, Encoding encoding, int bufferSize)

  2. internal StreamWriter(String path, bool append, Encoding encoding, int bufferSize, bool checkHost)

  3. private static Stream CreateFile(String path, bool append, bool checkHost)

它使用 FileMode.Append 标志调用以下内容。

Opens the file if it exists and seeks to the end of the file, or creates a new file. This requires FileIOPermissionAccess.Append permission. FileMode.Append can be used only in conjunction with FileAccess.Write. Trying to seek to a position before the end of the file throws an IOException exception, and any attempt to read fails and throws a NotSupportedException exception.

  1. internal FileStream(String path, FileMode mode, FileAccess access, FileShare share, int bufferSize, FileOptions options, String msgPath, bool bFromProxy, bool useLongPath, bool checkHost)

如您所见,对于在家玩的任何人,我们所做的只是创建一个文件流。从这里我们 Marshal 一些 安全属性 然后调用:

  1. private void Init(String path, FileMode mode, FileAccess access, int rights, bool useRights, FileShare share, int bufferSize, FileOptions options, Win32Native.SECURITY_ATTRIBUTES secAttrs, String msgPath, bool bFromProxy, bool useLongPath, bool checkHost)

此时发生了很多事情;检查权限;检查文件类型;检查手柄。然而,关键在于:

  1. internal static SafeFileHandle SafeCreateFile(String lpFileName, int dwDesiredAccess, System.IO.FileShare dwShareMode, SECURITY_ATTRIBUTES securityAttrs, System.IO.FileMode dwCreationDisposition, int dwFlagsAndAttributes, IntPtr hTemplateFile)

解析为 DllImport

[DllImport(KERNEL32, SetLastError=true, CharSet=CharSet.Auto, BestFitMapping=false)]
[ResourceExposure(ResourceScope.Machine)]
private static extern SafeFileHandle CreateFile(String lpFileName, int dwDesiredAccess, System.IO.FileShare dwShareMode, SECURITY_ATTRIBUTES securityAttrs, System.IO.FileMode dwCreationDisposition, int dwFlagsAndAttributes, IntPtr hTemplateFile);

我们的.Net故事到此结束,我们心爱的KERNEL32CreateFile Function

Creates or opens a file or I/O device. The most commonly used I/O devices are as follows: file, file stream, directory, physical disk, volume, console buffer, tape drive, communications resource, mailslot, and pipe. The function returns a handle that can be used to access the file or device for various types of I/O depending on the file or device and the flags and attributes specified.

如果您曾经使用过 CreateFile,您会在这里了解很多关于 FlagsCaching 和 [=242] 的信息=]缓冲,和quite-frankly很多甚至与File System无关的东西。那是因为这是 up-there 最古老的 用户模式 ​​ API 调用之一,它确实 all-sorts-of-things。但是,如果您遵循 .Net 来源(在这种情况下),它实际上并没有使用其扩展特性。

唯一的主要例外是:

  • FILE_APPEND_DATA

To write to the end of file, specify both the Offset and OffsetHigh members of the OVERLAPPED structure as 0xFFFFFFFF. This is functionally equivalent to previously calling the CreateFile function to open hFile using FILE_APPEND_DATA access.

  • FILE_FLAG_OVERLAPPED 指示异步 IO 的标志(您在这种情况下没有设置)

The file or device is being opened or created for asynchronous I/O.

When subsequent I/O operations are completed on this handle, the event specified in the OVERLAPPED structure will be set to the signaled state.

If this flag is specified, the file can be used for simultaneous read and write operations.

If this flag is not specified, then I/O operations are serialized, even if the calls to the read and write functions specify an OVERLAPPED structure.

同步和异步I/O句柄

If a file or device is opened for synchronous I/O (that is, FILE_FLAG_OVERLAPPED is not specified), subsequent calls to functions such as WriteFile can block execution of the calling thread until one of the following events occurs:

  • The I/O operation completes (in this example, a data write).
  • An I/O error occurs. (For example, the pipe is closed from the other end.)
  • An error was made in the call itself (for example, one or more parameters are not valid).
  • Another thread in the process calls the CancelSynchronousIo function using the blocked thread's thread handle, which terminates I/O for that thread, failing the I/O operation.
  • The blocked thread is terminated by the system; for example, the process itself is terminated, or another thread calls the TerminateThread function using the blocked thread's handle. (This is generally considered a last resort and not good application design.)

同步与异步I/O

In some cases, this delay may be unacceptable to the application's design and purpose, so application designers should consider using asynchronous I/O with appropriate thread synchronization objects such as I/O completion ports. For more information about thread synchronization, see About Synchronization. A process opens a file for asynchronous I/O in its call to CreateFile by specifying the FILE_FLAG_OVERLAPPED flag in the dwFlagsAndAttributes parameter. If FILE_FLAG_OVERLAPPED is not specified, the file is opened for synchronous I/O. When the file has been opened for asynchronous I/O, a pointer to an OVERLAPPED structure is passed into the call to ReadFile and WriteFile. When performing synchronous I/O, this structure is not required in calls to ReadFile and WriteFile.

CreateFile provides for creating a file or device handle that is either synchronous or asynchronous. A synchronous handle behaves such that I/O function calls using that handle are blocked until they complete, while an asynchronous file handle makes it possible for the system to return immediately from I/O function calls, whether they completed the I/O operation or not. As stated previously, this synchronous versus asynchronous behavior is determined by specifying FILE_FLAG_OVERLAPPED within the dwFlagsAndAttributes parameter. There are several complexities and potential pitfalls when using asynchronous I/O; for more information see Synchronous and Asynchronous I/O,

I/O 完成端口

I/O completion ports provide an efficient threading model for processing multiple asynchronous I/O requests on a multiprocessor system. When a process creates an I/O completion port, the system creates an associated queue object for requests whose sole purpose is to service these requests. Processes that handle many concurrent asynchronous I/O requests can do so more quickly and efficiently by using I/O completion ports in conjunction with a pre-allocated thread pool than by creating threads at the time they receive an I/O request.

I/O 完成端口如何工作

The CreateIoCompletionPort function creates an I/O completion port and associates one or more file handles with that port. When an asynchronous I/O operation on one of these file handles completes, an I/O completion packet is queued in first-in-first-out (FIFO) order to the associated I/O completion port. One powerful use for this mechanism is to combine the synchronization point for multiple file handles into a single object, although there are also other useful applications. Please note that while the packets are queued in FIFO order they may be dequeued in a different order.

注意:FileStream 可以使用完成端口,通过在 FileStream 重载之一中设置 useAsync true,但是你没有

public FileStream(String path, FileMode mode, FileAccess access, FileShare share, int bufferSize, bool useAsync)

实际写入

您选择了 WriteLine(),这实际上是 TextWriter 方法,但是在我们开始之前,让我们先对 FileStream 做一点说明。要使 atomically-appended 写入共享日志文件正常工作

  • FileStream 内部缓冲区需要足够大才能容纳一次写入的所有数据
  • 必须从位置 0 开始将数据写入缓冲区以使其适合
  • 通过 StreamWriter 包裹 FileStream 的文本必须在每次写入前完全刷新

在这些要求中,第一个 (1) 是最不适合解决的。创建 FileStream 时缓冲区大小是固定的(上面的 4096 参数),因此以原子方式写入更大事件的唯一方法是关闭并重新打开具有更大缓冲区的文件。

在写入之间刷新 StreamWriterFileStream 可以很好地满足要求 (2) 和 (3)。

  1. public virtual void WriteLine(String value)

  2. public virtual void Write(char\[\] buffer, int index, int count)

这个小东西有什么惊喜 gem :

for (int i = 0; i < count; i++) Write(buffer[index + i]);
  1. public virtual void Write(char value)

当缓冲区已满时,它会调用一连串的刷新,这有点难以理解,但我会尝试简单地

if (charPos == charLen) Flush(false, false);

其中 charLen = DefaultBufferSize 默认情况下传递给您创建 StreamWriter 的构造函数之一并定义如下:

internal const int DefaultBufferSize = 1024;   // char[]
  1. private void Flush(bool flushStream, bool flushEncoder)

从这里开始,最重要的两件事是:

if (count > 0)
     stream.Write(byteBuffer, 0, count);

// By definition, calling Flush should flush the stream, but this is
// only necessary if we passed in true for flushStream.  The Web
// Services guys have some perf tests where flushing needlessly hurts.
if (flushStream)
     stream.Flush();

Note : You have to love MS source code comments, those kids are a total riot

第二个Flush()(如果你坚持到底)无论如何都会在下面结束。记住我们的 StreamWriter 是由 FileStream class 支持的,所以我们再次以 FileStream class Write 方法结束

  1. public override void Write(byte\[\] array, int offset, int count)

  2. private unsafe void WriteCore(byte\[\] buffer, int offset, int count)

  3. private unsafe int WriteFileNative(SafeFileHandle handle, byte\[\] bytes, int offset, int count, NativeOverlapped* overlapped, out int hr)

正在解决另一个问题DllImport

[DllImport(KERNEL32, SetLastError=true)]
[ResourceExposure(ResourceScope.None)]
internal static unsafe extern int WriteFile(SafeFileHandle handle, byte* bytes, int numBytesToWrite, out int numBytesWritten, IntPtr mustBeZero);

然后在你知道之前我们又回到了 KERNEL32 再次调用 WriteFile Function

Writes data to the specified file or input/output (I/O) device. This function is designed for both synchronous and asynchronous operation. For a similar function designed solely for asynchronous operation

再一次,有一个 boat-load 选项可以处理各种情况。然而,再一次,.Net(在这种情况下)并不倾向于使用它。

从这里我们的故事切换到 Windows file caching,我们在 .Net 中几乎无法控制它,但是您可以在 [=242] 中使用很多选项=]原始Api调用.

By default, Windows caches file data that is read from disks and written to disks. This implies that read operations read file data from an area in system memory known as the system file cache, rather than from the physical disk. Correspondingly, write operations write file data to the system file cache rather than to the disk, and this type of cache is referred to as a write-back cache. Caching is managed per file object.

所以忽略实际的 内核模式 我们在 使用中实际做了什么模式?不多...在创建一个Stream.Net的时候做了一堆checks-and-balances来调用一个简单的CreateFileWin32 Api调用,即in-turn它持有一个句柄。当然有一堆 IL 被调用但是在它的 basic-level 它只是使用 Win32 API 来创建一个文件并保存一个 Handle(煮沸一些 .Net 安全和权限检查等...)

然后呢?好吧,我们处理一些编码,将一些字节写入内存,然后在预定的 buffer-size 我们 write/flush 它到 Disk 使用 FileWrite Win32 Api 调用 .

操作系统 had-to-do任何其他简单的用户模式had-to-do没有的东西 文件创建和写入?其实不是真的...

唯一的警告再次开始,确保 .Net 执行其 song-and-dance,如果您真的想要 Atomic Access文件系统用户模式然后考虑自己调用这些函数and/or使用IO完成端口。这样你就可以利用异步工作 and/or 躲避 .Net 并能够访问一堆扩展参数(尽管在大多数情况下它不会使它在单个应用程序作为 API 默认参数已经针对通用情况(如写入)进行了优化。

如果你真的有处理器指令强迫症,那么很容易看出它的价值,同时保持FileStream并继续以附加模式写入它(采取预防措施),或者如果您处于 Win32 Api 级别,则创建文件并持有连续写入的句柄并使用异步IO设施。

但这就是关键所在..这就是你作为一个用户模式程序员在用户模式中真正能做的(在大多数情况下) ).


#回答第二个问题

如前所述,您无能为力;滚动你自己的 API 调用;安装快速 HDD(在某些情况下可能会使用自己的驱动程序)。但不管怎样,操作系统已经缓存并优化了。如果你想调整这个,你需要再次转到 Windows API and/or 使用更高级的异步 IO 功能

最后,不同的服务器操作系统版本,在内存和缓存方面具有优势,尽管这一切都有很好的记录


#第三题答案

在多线程应用程序中写入打开文件的进程没有优先处理(.Net 内部缓冲区除外)而不是我知道的多进程写入同一个文件(但是更有经验的人可能会详细说明)。在 用户模式 ​​ 每个人都得到相同的 Apis 来工作,通过相同的 Filter DriversCaching Manager 。您可以阅读有关 Caching manager and improvements in operating systems here 的更多信息。

来自 MSDN

Caching occurs under the direction of the cache manager, which operates continuously while Windows is running. File data in the system file cache is written to the disk at intervals determined by the operating system, and the memory previously used by that file data is freed—this is referred to as flushing the cache. The policy of delaying the writing of the data to the file and holding it in the cache until the cache is flushed is called lazy writing, and it is triggered by the cache manager at a determinate time interval. The time at which a block of file data is flushed is partially based on the amount of time it has been stored in the cache and the amount of time since the data was last accessed in a read operation. This ensures that file data that is frequently read will stay accessible in the system file cache for the maximum amount of time.

The amount of I/O performance improvement that file data caching offers depends on the size of the file data block being read or written. When large blocks of file data are read and written, it is more likely that disk reads and writes will be necessary to finish the I/O operation. I/O performance will be increasingly impaired as more of this kind of I/O operation occurs.


#补充

以上纯属学术讨论。我们可以更深入地研究 Windows API 的源代码;我们可以跟进到 IO Completion ports and Kernel Mode;我们可以剖析驱动程序,但您的问题的答案不会变得更清晰......除了我们已经知道的之外。

如果您真的想了解更多关于内部结构在压力下如何反应的知识,您需要采取的下一步是 运行 您自己的 Benchmarks to gain actual implementation and Empirical Evidence。你会这样做;使用相同和不同的处理器在 Windows Api 级别测试 .Net 和自定义参数,以确定开销是否相关;使用不同的硬件;尝试使用不同的操作系统(例如设计为对缓存进行不同细粒度控制的服务器);不同的驱动程序使用不同的物理路径到您的物理设备。


#总结

简而言之,除了保持 Stream 存活之外(我们已经知道 Creating/Opening 文件会受到惩罚);使用异步IO操作;使用适当的缓冲区大小;将扩展参数直接发送到 Win32 Api 的 。除了 运行 在不同的 操作系统 以及配置和硬件上进行您自己的性能测试之外,我真的没有更多可以看到的,这将能够为您提供更多答案了解在您的情况下什么更有效率和性能。

我希望这会有所帮助,或者至少让人们从 StreamWriter 到 API(在这两个简单的调用中)有一段有趣的旅程。