合并两个大文件

Merge two big files

我需要将两个文本文件连接在一起,但我不想只是将一个添加到另一个,而是添加第一个文件的行,直到找到一个词,然后第二个也一样,然后回到第一个并继续循环,直到我 运行 out of both files .

我有以下代码,它适用于(但需要很长时间)大约 50k 行的文件,但我需要合并的文件大约有 2kk 行。

Private Sub Juntar_Click(sender As Object, e As EventArgs) Handles Juntar.Click
        Gravar.Enabled = False
        System.IO.File.Delete("c:\temp\tempfile.txt")

        Do Until (Prog_1_Button.Enabled = True And Prog_2_Button.Enabled = True)
            While Not (Prog_1_Button.Enabled)
                lines = System.IO.File.ReadAllLines(file1).ToList
                arrayLines = lines.ToArray
                Dim i As Integer = lines.IndexOf(Array.Find(arrayLines, Function(x) (x.Contains("teste"))))
                saida = lines.GetRange(0, i + 1)
                lines.RemoveRange(0, i + 1)

                System.IO.File.WriteAllLines(file1, lines)
                If i >= 0 Then
                    Prog_Bar.Value = Prog_Bar.Value + i
                    Exit While
                Else
                    saida = lines
                    Prog_1_Button.Enabled = True
                End If
            End While
            System.IO.File.AppendAllLines("c:\temp\tempfile.txt", saida)
            saida.Clear()
            While Not (Prog_2_Button.Enabled)
                lines = System.IO.File.ReadAllLines(file2).ToList
                arrayLines = lines.ToArray
                Dim i As Integer = lines.IndexOf(Array.Find(arrayLines, Function(x) (x.Contains("teste"))))
                saida = lines.GetRange(0, i + 1)
                lines.RemoveRange(0, i + 1)

                System.IO.File.WriteAllLines(file2, lines)
                If i >= 0 Then
                    Prog_Bar.Value = Prog_Bar.Value + i
                    Exit While
                Else
                    saida = lines
                    Prog_2_Button.Enabled = True
                End If
            End While

            System.IO.File.AppendAllLines("c:\temp\tempfile.txt", saida)
            saida.Clear()
        Loop
        Gravar.Enabled = True

    End Sub

示例:

**file_1:**
aaa1
bbb1
**teste**1
ccc1
ddd1
**teste**1


**file_2:**
aaa2
bbb2
**teste**2
ccc2
ddd2
**teste**2

**output:**
aaa1
bbb1
**teste**1
aaa2
bbb2
**teste**2
ccc1
ddd1
**teste**1
ccc2
ddd2
**teste**2

这段代码应该可以做到(更简单):

我创建了一个控制台应用程序,因此您可能需要添加 GUI 内容,或者不需要,因为这是会降低应用程序速度的因素之一。

Imports System
Imports System.IO

Module Program
    Sub Main(args As String())

        Dim lines1() As String = File.ReadAllLines("d:\temp\input1.txt")
        Dim lines2() As String = File.ReadAllLines("d:\temp\input2.txt")
        Dim newfile As System.IO.StreamWriter
        newfile = New StreamWriter("d:\temp\output.txt", False)

        Dim i2 As Integer = 0
        For i1 = 0 To lines1.Count - 1
            newfile.WriteLine(lines1(i1))
            If (lines1(i1).Contains("teste")) Then
                For j = i2 To lines2.Count - 1
                    newfile.WriteLine(lines2(j))
                    i2 = j + 1
                    If (lines2(j).Contains("teste")) Then
                        Exit For
                    End If
                Next
            End If
        Next
        For j = i2 To lines2.Count - 1
            newfile.WriteLine(lines2(j))
        Next
        newfile.Close()
    End Sub
End Module