删除 CSV 文件中的重复数据

Question

我实际上正在编写一个脚本来删除 12'000 行的 CSV 文件中的重复项。我知道此文件在 card_number 上的用户 ID AND/OR 上有重复项，其格式如下：

userid  fistname  lastname   card_number
-------------------------------------------
 1234   toto      help        111111
 1234   toto      help        111111

和

 1234   toto      help        111111
 5678   user      user2       111111

我想逐行读取，如果它们已经在字典对象中，则将它们添加到字典对象中，然后将剩余的行写入另一个文件，并将字典导出到日志文件中。

create/open/write/save 使用 fso 对象的文件的编码函数正在运行。

我无法返回字典方法，它似乎不起作用。

我不知道如何导出我的字典，或者这可能只是因为字典不起作用。

我对 Whosebug、ssh64 或 expert-exchange 进行了大量研究以找到解决方案，但我被阻止了，我想我的脚本已经快完成了，但我们将不胜感激。

```
`
`This is the dictionary part to record duplicates 
`in a file and remove them from the destination file
`
```
`
`# Declares required variables
Dim objFSO, objFolder, objShell, objTextFile, objFile
Dim strDirectory, CurDir, InputFile, OutputFile 
Dim strInput, strFile
Dim dictionary, it

`# Here we go !
Set objFSO = Createobject("Scripting.FileSystemobject") 
Set OutputFile = objFSO.CreateTextFile(CurDir & ".\myCSVfile.csv", 2, true)
Set objFile = objFSO.OpenTextFile(CurDir & InputFile, 1)

`# Reads the file until the end
Do Until objFile.AtEndOfStream

    strInput = objFile.ReadLine()
    strInput = Trim(strInput)
    If Len(strInput) > 0 Then
        'WScript.Echo strInput
        'OutputLog.Writeline strInput
        'Quit
    End If

    `# Test if it already exists, if YES, it's a duplicate
    If Not dictionary.exists(strInput) Then
        OutputFile.Writeline strInput
    Else
        dictionary.add strInput, null
        If dictionary.Count >= 0 Then
            objTextFile.Write dictionary.items
        Else
            objTextFile.Write "There are " & dictionary.Count & "  duplicated data in the file."
        End If
    End if

Loop

`# Populate the log file with the duplicated entries
For Each it In dictionary
    .Item  = it & "" & dictionary(it)
    objTextFile.Writeline .Item
Next

预期结果：

要用重复项填充的字典
重复写入的日志文件
要从最终文件中删除的重复项

实际结果：

打开输入文件
读取输入文件
创建输出文件
写入输出文件
打开日志文件
写入日志文件

Answer 1

每个字典值都需要一个键，因此如果您将输入中的每一行都视为一个键，并为该值复制它，这是使其工作的一种非常简单的方法。与处理代码本身相比，设置清理的代码更多。顺便说一句，如果你想变得更复杂，你可以传入一个数组作为字典值。并遍历字典和数组值，但看起来你只是想比较行。

dict.Add "Key", Split(line, ",")

我使用了你的例子，6 个中有 4 个应该是独一无二的：

Option Explicit

Dim fso : set fso = CreateObject("Scripting.FileSystemObject")
Dim fileIn : set fileIn = fso.OpenTextFile("c:\users\user\desktop\input.txt")
Dim fileOut : set fileOut = fso.OpenTextFile("c:\users\user\desktop\output.txt", 2, true) ' for writing/create
Dim dictlog : set dictlog = fso.OpenTextFile("c:\users\user\desktop\dictlog.txt", 2, true) ' for writing/create
Dim dict : set dict = CreateObject("Scripting.Dictionary")
Dim key
dim line

Do Until fileIn.AtEndOfStream
    key = fileIn.ReadLine
    line = key

    if Not dict.exists(key) Then
        dict.Add key, line      
        fileOut.WriteLine line
    else
        dictLog.WriteLine line
    end if
Loop

fileIn.Close
fileOut.Close
dictlog.Close

Set fso     = Nothing
Set fileIn  = Nothing
Set fileOut = Nothing
Set line    = Nothing
Set dict    = Nothing
set dictlog = Nothing

删除 CSV 文件中的重复数据

Remove duplicate data in a CSV file

vbscript