使用文件中的数字查询数据集

Query data set using number in a file

我正在尝试使用 VBScript 仅使用标识号来查询包含人员数据列表的文件。目前我有一个包含所有人员数据的数据集文件和一个包含我想要从数据集中查询结果的 ID 号的查询文件。一旦查询结果匹配,我想将该行输出到结果文件。

这是数据集文件和查询文件中包含的数据的一般示例。

数据集:

ID,Name,City,State,Zipcode,Phone
1885529946,Hall,Louisville,KY,40208,5026366683
1886910320,Brown,Sacramento,CA,95814,5302981550
1953250581,Rios,Sterling,OK,73567,5803658077
1604767393,Barner,Irvine,CA,92714,9494768597
1713746771,Herrera,Stotts City,MO,65756,4172852393
1022686106,Moore,Ocala,FL,34471,3526032811
1579121274,Beyer,Alexandria,MD,22304,3013838430
1288569655,Rondeau,Augusta,GA,30904,7066671404
1954615404,Angel,Los Angeles,CA,90014,5622961806
1408747874,Lagasse,Traverse City,MI,49686,2318182792

查询文件:

1885529946
1713746771
1408747874

我能够读取查询文件中的所有行并使用 WScript.Echo 显示 ID 号。不会生成任何错误,脚本不会结束,也不会生成结果文件。结果文件应仅包含数据集中与 ID 号匹配的行。例如:

1885529946,Hall,Louisville,KY,40208,5026366683
1713746771,Herrera,Stotts City,MO,65756,4172852393
1408747874,Lagasse,Traverse City,MI,49686,2318182792

这是我尝试使用的脚本:

Const intForReading = 1
Const intForWriting = 2
Const intForAppending = 8

strQueryFile = "C:\numbers_test.txt"
strDataSetFile = "C:\data_test.csv"
strOutputFile = "C:\results_test.csv"

Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objFileToRead = objFSO.OpenTextFile(strQueryFile, intForReading)
Set objFileToQuery = objFSO.OpenTextFile(strDataSetFile, intForReading)
Set objFileToWrite = objFSO.OpenTextFile(strOutputFile, intForWriting, True)

Do Until objFileToQuery.AtEndOfStream
    Do Until objFileToRead.AtEndOfStream
        strNumber = objFileToRead.ReadLine()
        WScript.Echo strNumber
        strLine = objFileToQuery.ReadLine()
        If InStr(strLine,strNumber) > 0 Then strFoundText = strLine
        objFileToWrite.WriteLine(strFoundText)
    Loop
Loop

objFileToQuery.Close
objFileToRead.Close
objFileToWrite.Close

您的代码中的问题是这些文件作为 streams 打开。一旦到达这样一个流的末尾(即 .AtEndOfStream 变为真,例如在重复调用 .ReadLine() 之后),它不会神奇地倒回到文件的开头。您的 "nested loop" 方法需要倒带查询文件才能运行。

这可以通过关闭并重新打开流来实现,但效率不高。将 all 数字与输入文件中的每一行进行比较也不是很有效。我建议您使用字典对象将数字存储在查询文件中。字典存储键值对并针对非常快速的键查找进行了优化(通过 .Exists(someKey)),因此它们非常适合此任务。

通过这种方式,您可以非常快速地找出是否应将一行写入输出文件:

Const intForReading = 1
Const intForWriting = 2
Const intForAppending = 8

strQueryFile = "C:\numbers_test.txt"
strDataSetFile = "C:\data_test.csv"
strOutputFile = "C:\results_test.csv"

Set objFSO = CreateObject("Scripting.FileSystemObject")

' first import the query file into a dictionary for easy lookup
Set numbers = CreateObject("Scripting.Dictionary")    
With objFSO.OpenTextFile(strQueryFile, intForReading)
    Do Until .AtEndOfStream
        ' we are only interested in the key for this task, the value is completely irrelevant.
        numbers.Add .ReadLine(), ""
    Loop
    .Close
End With

Set objFileToWrite = objFSO.OpenTextFile(strOutputFile, intForWriting, true)    
With objFSO.OpenTextFile(strDataSetFile, intForReading)
    Do Until .AtEndOfStream
        line = .ReadLine()
        columns = Split(line, ",")
        currentNumber = columns(0)
        If numbers.Exists(currentNumber) Then objFileToWrite.WriteLine(line)
    Loop
    .Close
End With

objFileToWrite.Close

我喜欢在必要时使用 ADODB for these kind of tasks, and treat the input files as a database. The trick typically is to find the right connection string for your system and use a Schema.ini file

option explicit

Const adClipString = 2

dim ado: set ado = CreateObject("ADODB.Connection")
' data files are in this folder
' using the old JET driver
ado.ConnectionString = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=.\;Extended Properties=""text;HDR=Yes;FMT=Delimited"";"
' or maybe use ACE if installed
' ado.ConnectionString = "Driver=Microsoft Access Text Driver (*.txt, *.csv);Dbq=.\;Extensions=asc,csv,tab,txt;"
ado.open

' query is in a CSV too, so we can access as a table
' the column names are given in Schema.ini
const QUERY = "SELECT * FROM [data_test.csv] WHERE ID IN (SELECT ID FROM [query_test.csv])"
' or literals 
' const QUERY = "SELECT * FROM [data_test.csv] WHERE ID IN ('1885529946', '1713746771', '1408747874')"

dim rs: set rs = ado.Execute(QUERY)

' convenient GetString() method allows formatting the result
' this could be written to file instead of outputting to console
WScript.Echo rs.GetString(adClipString, , vbTab, vbNewLine, "[NULL]")

'or create a new table!
'delete results table if exists
' catch an error if the table does not exist
on error resume next
' for some reason you need to use #csv not .csv here
ado.Execute "DROP TABLE result#csv"
if err then
    WScript.Echo err.description
end if
on error goto 0

ado.Execute("SELECT * INTO [result.csv] FROM [data_test.csv] WHERE ID IN (SELECT ID FROM [query_test.csv])")

rs.close
ado.close

Schema.ini 文件

[data_test.csv]
Format=CSVDelimited
ColNameHeader=True

Col1=ID Text
Col2=Name Text
Col3=City Text 
Col4=Zipcode Text
Col5=Phone Text


[query_test.csv]
Format=CSVDelimited
ColNameHeader=False 

Col1=ID Text