从文本文件中删除重复的单词

Remove Repeated Words from Text file

我有一个文本文件,包含将近 45,000 个单词,每行一个单词。成千上万的这些词出现了 10 次以上。我想创建一个没有重复单词的新文件。我使用 Stream reader 但它只读取文件一次。我怎样才能摆脱重复的单词。请帮我。谢谢 我的代码是这样的

Try
        File.OpenText(TextBox1.Text)
    Catch ex As Exception
        MsgBox(ex.Message)
        Exit Sub
    End Try

    Dim line As String = String.Empty
    Dim OldLine As String = String.Empty
    Dim sr = File.OpenText(TextBox1.Text)

    line = sr.ReadLine
    OldLine = line

    Do While sr.Peek <> -1
        Application.DoEvents()
        line = sr.ReadLine
        If OldLine <> line Then
                My.Computer.FileSystem.WriteAllText(My.Computer.FileSystem.SpecialDirectories.Desktop & "\Splitted File without Repeats.txt", line & vbCrLf, True)
        End If

        OldLine = line
    Loop


    sr.Close()
    System.Diagnostics.Process.Start(My.Computer.FileSystem.SpecialDirectories.Desktop & "\Splitted File without Repeats.txt")
    MsgBox("Loop terminated. Stream Reader Closed." & vbCrLf)

您可以为此使用 LINQ 的 Distinct() 方法。

这适用于较小的文件:

Dim lines As String() = File.ReadAllLines("yourfile.txt")
File.WriteAllLines("yourfile.txt", lines.Distinct().ToArray())