为什么 Stream.Copy 比 FileStream Stream.Write 快?
Why is Stream.Copy faster than Stream.Write to FileStream?
我有一个问题,但找不到原因。
我正在创建一个自定义存档文件。我正在使用 MemoryStream
来存储数据,最后我使用 FileStream
将数据写入磁盘。
我的硬盘是SSD,但是速度太慢了。当我尝试只向文件写入 95 MB 时,写入需要 12 秒!
我试过Filestream.Write
和File.WriteAllBytes
,但都是一样的。
最后我想到了通过复制来完成的想法,它 快了 100 倍!
我需要知道为什么会这样以及写入函数有什么问题。
这是我的代码:
//// First of all I create an example 150MB file
Random randomgen = new Random();
byte[] new_byte_array = new byte[150000000];
randomgen.NextBytes(new_byte_array);
//// I turned the byte array into a MemoryStream
MemoryStream file1 = new MemoryStream(new_byte_array);
//// HERE I DO SOME THINGS WITH THE MEMORYSTREAM
/// Method 1 : File.WriteAllBytes | 13,944 ms
byte[] output = file1.ToArray();
File.WriteAllBytes("output.test", output);
// Method 2 : FileStream | 8,471 ms
byte[] output = file1.ToArray();
FileStream outfile = new FileStream("outputfile",FileMode.Create,FileAccess.ReadWrite);
outfile.Write(output,0, output.Length);
// Method 3 | FileStream | 147 ms !!!! :|
FileStream outfile = new FileStream("outputfile",FileMode.Create,FileAccess.ReadWrite);
file1.CopyTo(outfile);
此外,file1.ToArray()
只需 90 毫秒即可将 MemoryStream 转换为字节。
为什么会这样,背后的原因和逻辑是什么?
更新
has right. The performances you gain by extending FileStream
internal buffer will be taken away when you do actual Flush
. I dig a bit deeper and did some benchmark and it seems that the difference between Stream.CopyTo
and FileStream.Write
is that Stream.CopyTo
use I/O buffer smarter and boost performances by copying chunk by chunk. At the end CopyTo
use Write
under the hood. The optimum buffer size has been discussed here.
Optimum buffer size is related to a number of things: file system
block size, CPU cache size, and cache latency. Most file systems are
configured to use block sizes of 4096 or 8192. In theory, if you
configure your buffer size so you are reading a few bytes more than
the disk block, the operations with the file system can be extremely
inefficient (i.e. if you configured your buffer to read 4100 bytes at
a time, each read would require 2 block reads by the file system). If
the blocks are already in cache, then you wind up paying the price of
RAM -> L3/L2 cache latency. If you are unlucky and the blocks are not
in cache yet, you pay the price of the disk->RAM latency as well.
所以要回答你的问题,在你的情况下,你在使用 Write
时使用未优化的缓冲区大小,而在你使用 CopyTo
时使用优化的缓冲区大小,或者更好地说 Stream
本身会优化给你的。
通常,您也可以通过扩展 FileStream
内部缓冲区强制未优化的 CopyTo
,在这种情况下,结果应该比未优化的 Write
.
慢
FileStream outfile = new FileStream("outputfile",
FileMode.Create,
FileAccess.ReadWrite,
FileShare.Read,
150000000); //internal buffer will lead to inefficient disk write
file1.CopyTo(outfile);
outfile.Flush(); //don't forget to flush data to disk
原创
我分析了 FileStream
和 MemoryStream
的 Write
方法,重点是 MemoryStream
总是使用内部缓冲区来复制数据,而且速度非常快。如果请求 count >= bufferSize
,FileStream
本身有一个开关,这在您的情况下是正确的,因为您使用的是默认 FileStream
缓冲区,默认缓冲区大小为 4096
。在那种情况下 FileStream
根本不使用缓冲区,而是使用原生 Win32Native.WriteFile
.
诀窍是通过覆盖默认缓冲区大小强制FileStream
使用缓冲区。试试这个:
// Method 2 : FileStream | 8,471 ms
byte[] output = file1.ToArray();
FileStream outfile = new FileStream("outputfile",
FileMode.Create,
FileAccess.ReadWrite,
FileShare.Read,
output.Length + 1); // important, the size of the buffer
outfile.Write(output, 0, output.Length);
n.b。我并不是说这是最佳缓冲区大小,只是为了解释发生了什么。要使用 FileStream
检查最佳缓冲区大小,请参阅 link.
我有一个问题,但找不到原因。
我正在创建一个自定义存档文件。我正在使用 MemoryStream
来存储数据,最后我使用 FileStream
将数据写入磁盘。
我的硬盘是SSD,但是速度太慢了。当我尝试只向文件写入 95 MB 时,写入需要 12 秒!
我试过Filestream.Write
和File.WriteAllBytes
,但都是一样的。
最后我想到了通过复制来完成的想法,它 快了 100 倍!
我需要知道为什么会这样以及写入函数有什么问题。
这是我的代码:
//// First of all I create an example 150MB file
Random randomgen = new Random();
byte[] new_byte_array = new byte[150000000];
randomgen.NextBytes(new_byte_array);
//// I turned the byte array into a MemoryStream
MemoryStream file1 = new MemoryStream(new_byte_array);
//// HERE I DO SOME THINGS WITH THE MEMORYSTREAM
/// Method 1 : File.WriteAllBytes | 13,944 ms
byte[] output = file1.ToArray();
File.WriteAllBytes("output.test", output);
// Method 2 : FileStream | 8,471 ms
byte[] output = file1.ToArray();
FileStream outfile = new FileStream("outputfile",FileMode.Create,FileAccess.ReadWrite);
outfile.Write(output,0, output.Length);
// Method 3 | FileStream | 147 ms !!!! :|
FileStream outfile = new FileStream("outputfile",FileMode.Create,FileAccess.ReadWrite);
file1.CopyTo(outfile);
此外,file1.ToArray()
只需 90 毫秒即可将 MemoryStream 转换为字节。
为什么会这样,背后的原因和逻辑是什么?
更新
FileStream
internal buffer will be taken away when you do actual Flush
. I dig a bit deeper and did some benchmark and it seems that the difference between Stream.CopyTo
and FileStream.Write
is that Stream.CopyTo
use I/O buffer smarter and boost performances by copying chunk by chunk. At the end CopyTo
use Write
under the hood. The optimum buffer size has been discussed here.
Optimum buffer size is related to a number of things: file system block size, CPU cache size, and cache latency. Most file systems are configured to use block sizes of 4096 or 8192. In theory, if you configure your buffer size so you are reading a few bytes more than the disk block, the operations with the file system can be extremely inefficient (i.e. if you configured your buffer to read 4100 bytes at a time, each read would require 2 block reads by the file system). If the blocks are already in cache, then you wind up paying the price of RAM -> L3/L2 cache latency. If you are unlucky and the blocks are not in cache yet, you pay the price of the disk->RAM latency as well.
所以要回答你的问题,在你的情况下,你在使用 Write
时使用未优化的缓冲区大小,而在你使用 CopyTo
时使用优化的缓冲区大小,或者更好地说 Stream
本身会优化给你的。
通常,您也可以通过扩展 FileStream
内部缓冲区强制未优化的 CopyTo
,在这种情况下,结果应该比未优化的 Write
.
FileStream outfile = new FileStream("outputfile",
FileMode.Create,
FileAccess.ReadWrite,
FileShare.Read,
150000000); //internal buffer will lead to inefficient disk write
file1.CopyTo(outfile);
outfile.Flush(); //don't forget to flush data to disk
原创
我分析了 FileStream
和 MemoryStream
的 Write
方法,重点是 MemoryStream
总是使用内部缓冲区来复制数据,而且速度非常快。如果请求 count >= bufferSize
,FileStream
本身有一个开关,这在您的情况下是正确的,因为您使用的是默认 FileStream
缓冲区,默认缓冲区大小为 4096
。在那种情况下 FileStream
根本不使用缓冲区,而是使用原生 Win32Native.WriteFile
.
诀窍是通过覆盖默认缓冲区大小强制FileStream
使用缓冲区。试试这个:
// Method 2 : FileStream | 8,471 ms
byte[] output = file1.ToArray();
FileStream outfile = new FileStream("outputfile",
FileMode.Create,
FileAccess.ReadWrite,
FileShare.Read,
output.Length + 1); // important, the size of the buffer
outfile.Write(output, 0, output.Length);
n.b。我并不是说这是最佳缓冲区大小,只是为了解释发生了什么。要使用 FileStream
检查最佳缓冲区大小,请参阅 link.