这样使用 FileStream.seek 安全吗?

Is it safe to use FileStream.seek in this way?

假设我有一个由一系列 object 组成的文件格式,其中每个 object 都有一个 header,格式如下:

public struct FileObjectHeader {
    //The type of the object (not important for this question, but it exists)
    public byte TypeID;
    //The length of the object's data, which DOES NOT include the size of the header.
    public UInt16 Length;
}

后跟指定长度的数据。

我通过首先为每个 object 和 object 的 header 创建一个位置列表来读取此数据:

struct FileObjectIndex {
    public FileObjectHeader Header;
    public long Location;
}
public List<FileObject> ReadObjects(Stream s) {
    List<FileObjectReference> objectRefs = new List<FileObjectReference>();

    try {
        while (true) {
            FileObjectHeader header = ReadObjectHeader(s); 
            //The above advances the stream by the size of the header as well.
            FileObjectReference reference = new FileObjectReference() { Header = header, Position = stream.Position };
            objectRefs.add(reference);
            //Advance the stream to the next object's header.
            s.Seek(header.Length, SeekOrigin.Current);
        }
    } catch (EndOfStreamException) {
        //Do nothing as this is an expected case
    }

    //Now we'd read all of the objects that we've previously located.
    //This code isn't too important for the question but I'm including it for reference.
    List<FileObject> objects = new List<FileObject>();
    foreach (var reference in objectRefs) {
        s.seek(reference.Location, SeekOrigin.Begin);

        objects.add(ReadObject(reference.Header, s));
    }

    return objects;
}

一些注意事项:

我的问题是:

因为我正在使用 FileStream.seek,所以使用 seek 会导致超出流末尾并无限期扩展文件的情况吗?根据文档:

You can seek to any location beyond the length of the stream. When you seek beyond the length of the file, the file size grows. In Windows NT and later versions, data added to the end of the file is set to zero. In Windows 98 or earlier versions, data added to the end of the file is not set to zero, which means that previously deleted data is visible to the stream.

按照上述方式,它似乎可以在我不扩展到的情况下扩展文件,导致 ever-growing 文件从 header 读取 3 个字节。实际上,这似乎不会发生,但我想确认它不会发生。

FileStream.Read() 的文档却说:

Return Value
Type: System.Int32
The total number of bytes read into the buffer. This might be less than the number of bytes requested if that number of bytes are not currently available, or zero if the end of the stream is reached.

因此我强烈怀疑(但你自己验证一下)这种追尾只适用于你之后写入文件的情况。这是有道理的——如果你知道你会需要它,你可以保留 space,而不用实际写任何东西(这会很慢)。

然而,当读取时,我的猜测是您应该在 return 中得到 0,并且不会读取任何数据。另外,没有文件扩展。

简单回答你的问题,下面的代码不会让你的文件变大。但是它会抛出新的 EndOfStreamException()。只有在文件末尾以外的位置写入才会使文件增长。当文件增长时,当前文件末尾和写入开始之间的数据将用零填充(除非您启用了稀疏标志,在这种情况下它将被标记为未分配)。

using (var fileStream = new FileStream("f", FileMode.OpenOrCreate, FileAccess.ReadWrite, FileShare.None))
{
    var buffer = new byte[10];
    fileStream.Seek(10, SeekOrigin.Begin);
    var bytesRead = fileStream.Read(buffer, 0, 10);
    if (bytesRead == 0) {
        throw new EndOfStreamException();
    }
}

既然你是reading/writing二进制结构化数据,我建议三件事:

  1. 您的二进制结构化数据在磁盘块中应具有整数个元素。在大多数系统上,这是 4096 MSDN。这样做将允许 CLR 直接从文件系统缓存中读取数据到您的缓冲区中。
  2. 使用 MemoryMappedFile, and unsafe pointers to access your data (if your app will run on windows only). You can also use a ViewAccessor,但您可能会发现这比自己进行缓存要慢,因为互操作会产生额外的副本。如果您走不安全的路线,这里的代码将快速填充您的结构:

    internal static class Native
    {
        [DllImport("kernel32.dll", EntryPoint = "CopyMemory", SetLastError = false)]
        private static unsafe extern void CopyMemory(void *dest, void *src, int count);
    
        private static unsafe byte[] Serialize(TestStruct[] index)
        {
            var buffer = new byte[Marshal.SizeOf(typeof(TestStruct)) * index.Length];
            fixed (void* d = &index[0])
            {
                fixed (void* s = &buffer[0])
                {
                    CopyMemory(d, s, buffer.Length);
                }
            }
    
            return buffer;
        }
    }