使用 SharpZipLib 压缩大文件导致内存不足异常

Question

我有一个 453MB XML 文件，我正在尝试使用 SharpZipLib.

将其压缩为 ZIP

下面是我用来创建 zip 的代码，但它导致 OutOfMemoryException。此代码成功压缩了一个428MB的文件。

知道为什么会发生异常，因为我不明白为什么，因为我的系统有足够的可用内存。

public void CompressFiles(List<string> pathnames, string zipPathname)
{
    try
    {
        using (FileStream stream = new FileStream(zipPathname, FileMode.Create, FileAccess.Write, FileShare.None))
        {
            using (ZipOutputStream stream2 = new ZipOutputStream(stream))
            {
                foreach (string str in pathnames)
                {
                    FileStream stream3 = new FileStream(str, FileMode.Open, FileAccess.Read, FileShare.Read);
                    byte[] buffer = new byte[stream3.Length];
                    try
                    {
                        if (stream3.Read(buffer, 0, buffer.Length) != buffer.Length)
                        {
                            throw new Exception(string.Format("Error reading '{0}'.", str));
                        }
                    }
                    finally
                    {
                        stream3.Close();
                    }
                    ZipEntry entry = new ZipEntry(Path.GetFileName(str));
                    stream2.PutNextEntry(entry);
                    stream2.Write(buffer, 0, buffer.Length);
                }
                stream2.Finish();
            }
        }
    }
    catch (Exception)
    {
        File.Delete(zipPathname);
        throw;
    }
}

Answer 1

您无缘无故分配了大量内存，我敢打赌您有一个 32 位进程。 32位进程在正常情况下最多只能分配2GB的虚拟内存，库肯定也会分配内存。

无论如何，这里有几处错误：

byte[] buffer = new byte[stream3.Length];

为什么？你不需要将整个东西存储在内存中来处理它。
if (stream3.Read(buffer, 0, buffer.Length) != buffer.Length)

这个很讨厌。 Stream.Read 被明确允许 return 少于字节比你要求的，这仍然是一个有效的结果。将流读入缓冲区时，您必须重复调用 Read，直到缓冲区填满或到达流的末尾。
你的变量应该有更有意义的名字。你很容易迷失这些stream2、stream3等

一个简单的解决方案是：

using (var zipFileStream = new FileStream(zipPathname, FileMode.Create, FileAccess.Write, FileShare.None))
using (ZipOutputStream zipStream = new ZipOutputStream(zipFileStream))
{
    foreach (string str in pathnames)
    {
        using(var itemStream = new FileStream(str, FileMode.Open, FileAccess.Read, FileShare.Read))
        {
            var entry = new ZipEntry(Path.GetFileName(str));
            zipStream.PutNextEntry(entry);
            itemStream.CopyTo(zipStream);
        }
    }
    zipStream.Finish();
}

Answer 2

您正在尝试创建与文件一样大的缓冲区。相反，将缓冲区设置为固定大小，向其中读取一些字节，然后将读取的字节数写入 zip 文件。

这是带有 4096 字节缓冲区（以及一些清理）的代码：

public static void CompressFiles(List<string> pathnames, string zipPathname)
{
    const int BufferSize = 4096;
    byte[] buffer = new byte[BufferSize];

    try
    {
        using (FileStream stream = new FileStream(zipPathname,
            FileMode.Create, FileAccess.Write, FileShare.None))
        using (ZipOutputStream stream2 = new ZipOutputStream(stream))
        {
            foreach (string str in pathnames)
            {
                using (FileStream stream3 = new FileStream(str,
                    FileMode.Open, FileAccess.Read, FileShare.Read))
                {
                    ZipEntry entry = new ZipEntry(Path.GetFileName(str));
                    stream2.PutNextEntry(entry);

                    int read;
                    while ((read = stream3.Read(buffer, 0, buffer.Length)) > 0)
                    {
                        stream2.Write(buffer, 0, read);
                    }
                }
            }
            stream2.Finish();
        }
    }
    catch (Exception)
    {
        File.Delete(zipPathname);
        throw;
    }
}

特别注意这一块：

const int BufferSize = 4096;
byte[] buffer = new byte[BufferSize];
// ...
int read;
while ((read = stream3.Read(buffer, 0, buffer.Length)) > 0)
{
    stream2.Write(buffer, 0, read);
}

这会将字节读入 buffer。当没有更多的字节时，Read() 方法 returns 0，所以这就是我们停止的时候。当Read()成功时，我们可以确定缓冲区中有一些数据，但我们不知道有多少字节。可能会填满整个缓冲区，或者只是其中的一小部分。因此，我们使用读取的字节数read来确定向ZipOutputStream.

写入多少字节

顺便说一句，该代码块可以替换为添加到 .Net 4.0 的简单语句，其作用完全相同：

stream3.CopyTo(stream2);

所以，您的代码可以变成：

public static void CompressFiles(List<string> pathnames, string zipPathname)
{
    try
    {
        using (FileStream stream = new FileStream(zipPathname,
            FileMode.Create, FileAccess.Write, FileShare.None))
        using (ZipOutputStream stream2 = new ZipOutputStream(stream))
        {
            foreach (string str in pathnames)
            {
                using (FileStream stream3 = new FileStream(str,
                    FileMode.Open, FileAccess.Read, FileShare.Read))
                {
                    ZipEntry entry = new ZipEntry(Path.GetFileName(str));
                    stream2.PutNextEntry(entry);

                    stream3.CopyTo(stream2);
                }
            }
            stream2.Finish();
        }
    }
    catch (Exception)
    {
        File.Delete(zipPathname);
        throw;
    }
}

现在您知道为什么会出现错误，以及如何使用缓冲区。

使用 SharpZipLib 压缩大文件导致内存不足异常

Compress large file using SharpZipLib causing Out Of Memory Exception

c#

compression

out-of-memory

sharpziplib