根据字符串分离大文件并插入回车returns

Separating large file and inserting carriage returns based on string

VB.Net 的新手,但一位朋友推荐我将其用于我正在尝试做的事情。我有一个巨大的文本文件,我想在特定字符串后插入回车 returns。

除了下面的混乱之外,我如何更改它以读取文件,然后一旦我们看到文本 "ext" 插入一个新的换行符。我期待输入文件中的一行产生大量的回车 returns.

目前我在下面设法一起模拟的是读取输入文件直到行尾并将其再次写出到另一个文件中。

Module Module1
Sub Main()
    Try
        ' Create an instance of StreamReader to read from a file. 
        ' The using statement also closes the StreamReader. 
        Using sr As StreamReader = New StreamReader("C:\My Documents\input.txt")
            Dim line As String
            ' Read and display lines from the file until the end of  
            ' the file is reached. 

            Using sw As StreamWriter = New StreamWriter("C:\My Documents\output.txt")
                Do Until sr.EndOfStream
                    line = sr.ReadLine()
                    sw.WriteLine(line)
                    Console.WriteLine("done")
                Loop
            End Using
        End Using
    Catch e As Exception
        ' Let the user know what went wrong.
        Console.WriteLine("The file could not be read:")
        Console.WriteLine(e.Message)
    End Try
    Console.ReadKey()
End Sub

根据评论进行了更改。由于内存限制,在 500 MB 文件时失败:

    Sub Main()
    Try
        ' Create an instance of StreamReader to read from a file. 
        ' The using statement also closes the StreamReader. 
        Using sr As StreamReader = New StreamReader("C:\My Documents\input.txt")
            Dim line As String
            Dim term As String = "</ext>"
            ' Read and display lines from the file until the end of  
            ' the file is reached. 

            Using sw As StreamWriter = New StreamWriter("C:\My Documents\output.txt")
                Do Until sr.EndOfStream
                    line = sr.ReadLine()
                    line = line.Replace(term, term + Environment.NewLine)
                    sw.WriteLine(line)
                    Console.WriteLine("done")
                Loop
            End Using
        End Using

由于您的行数很大,您必须:

  • Read/Write一次一个字符
  • 保存最后 x 个字符
  • 如果最后 x 个字符与您的任期相同,请换行

    Dim term As String = "</ext>"
    Dim lastChars As String = "".PadRight(term.Length)
    
    Using sw As StreamWriter = New StreamWriter("C:\My Documents\output.txt")
        Using sr As New System.IO.StreamReader("C:\My Documents\input.txt")
            While Not sr.EndOfStream
                Dim buffer(1) As Char
                sr.Read(buffer, 0, 1)
    
                lastChars &= buffer(0)
                lastChars = lastChars.Remove(0, 1)
    
                sw.Write(buffer(0))
    
                If lastChars = term Then
                    sw.Write(Environment.NewLine)
                End If
    
            End While
        End Using
    End Using
    

注意:这不适用于 Unicode 文件。这假设每个字符是一个字节。