在C#中读取大文件

Reading huge files in C#

我必须逐行读取 4-10gb 的大文件,问题是当我读取 ~2gb 时出现 .Net 进程和 OutOfMemory 异常

起初我只是试图计算行数,但是我需要单独访问每一行以从中删除一些数据。

据我所知,每个选项都将前几行保留在内存中,我只希望它保留当前读取的行(除非有人知道保留所有行的技巧)

这是我试过的,还有一些类似的东西:

StreamReader reader = File.OpenText(FilePath);
while ((line = reader.ReadLine()) != null)    //This is where it errors
{
   count++;
}
reader.Close();

例外情况是:

Exception of type 'System.OutOfMemoryException' was thrown.
at System.Text.StringBuilder.ExpandByABlock(Int32 minBlockCharCount)
at System.Text.StringBuilder.Append(Char* value, Int32 valueCount)
at System.Text.StringBuilder.Append(Char[] value, Int32 startIndex, Int32  charCount)
at System.IO.StreamReader.ReadLine()
at CSV.Program.NumLines() in C:\Users\ted\Documents\Visual Studio 2015\Projects\vConnect\CSV\CSV\Program.cs:line 100
 at CSV.Program.Main(String[] args) in C:\Users\ted\Documents\Visual Studio 2015\Projects\vConnect\CSV\CSV\Program.cs:line 20
at System.AppDomain._nExecuteAssembly(RuntimeAssembly assembly, String[] args)
at System.AppDomain.ExecuteAssembly(String assemblyFile, Evidence assemblySecurity, String[] args)
at Microsoft.VisualStudio.HostingProcess.HostProc.RunUsersAssembly()
at System.Threading.ThreadHelper.ThreadStart_Context(Object state)
at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state)
at System.Threading.ThreadHelper.ThreadStart()

谢谢

您可以使用 class FileStream 中的方法:FileStream.Read and FileStream.Seek should allow you to do what you need. An example can be found here: http://www.codeproject.com/Questions/543821/ReadplusBytesplusfromplusLargeplusBinaryplusfilepl

你必须稍微修改一下,但基本上你可以从 0 开始,阅读直到找到换行符,处理该行,从你到达的地方开始并重复。它不会非常有效,但它会完成工作。

希望对您有所帮助。