拆分文本文件中的列

Question

我有一个每天生成 3 个文本 (.txt) 文件的系统，每个文件中有 1000 个条目。

生成文本文件后，我们运行一个 vbscript（如下）通过在特定列位置输入数据来修改文件。

我现在需要这个 vbscript 来完成一项额外的任务，即在其中一个文本文件中分隔一列。

因此，例如 TR201501554s.txt 文件如下所示：

6876786786  GFS8978976        I
6786786767  DDF78676          I
4343245443  SBSSK67676        I
8393372263  SBSSK56565        I
6545434347  DDF7878333        I
6757650000  SBSSK453          I

通过分离列的额外任务，数据现在看起来像这样，列在特定位置分离。

6876786786  GFS    8978976      I
6786786767  DDF    78676        I
4343245443  SBSSK  67676        I
8393372263  SBSSK  56565        I
6545434347  DDF    7878333      I
6757650000  SBSSK  453          I

我在想也许我可以添加另一个 "case" 来完成此任务，也许可以使用 "regex" 模式，因为该模式只能找到 3 家公司（DDF、GFS 和 SBSSK）。

但是看了很多例子后，我真的不知道从哪里开始。

有人可以让我知道如何在我们的 vbscript（下面）中完成这个额外的任务吗？

Option Explicit
Const ForReading = 1
Const ForWriting = 2


Dim objFSO, pFolder, cFile, objWFSO, objFileInput, objFileOutput,strLine
Dim strInputPath, strOutputPath , sName, sExtension
Dim strSourceFileComplete, strTargetFileComplete, objSourceFile, objTargetFile
Dim iPos, rChar
Dim fileMatch


'folder paths
strInputPath = "C:\Scripts\Test"
strOutputPath = "C:\Scripts\Test"

'Create the filesystem object
Set objFSO = CreateObject("Scripting.FileSystemObject")
'Get a reference to the processing folder
Set pFolder = objFSO.GetFolder(strInputPath)

'loop through the folder and get the file names to be processed
For Each cFile In pFolder.Files
ProcessAFile cFile
Next

Sub ProcessAFile(objFile)
fileMatch = false

Select Case Left(objFile.Name,2)
    Case "MV"
        iPos = 257
        rChar = "YES"
        fileMatch = true
    Case "CA"
        iPos = 45
        rChar = "OCCUPIED"
        fileMatch = true
    Case "TR"
        iPos = 162
        rChar = "EUR"
        fileMatch = true
End Select

If fileMatch = true Then

    Set objWFSO = CreateObject("Scripting.FileSystemObject")
    Set objFileInput = objWFSO.OpenTextFile(objFile.Path, ForReading)
    strSourceFileComplete = objFile.Path
    sExtension = objWFSO.GetExtensionName(objFile.Name)
    sName = Replace(objFile.Name, "." & sExtension, "")

    strTargetFileComplete = strOutputPath & "\" & sName & "_mod." & sExtension
    Set objFileOutput = objFSO.OpenTextFile(strTargetFileComplete, ForWriting, True) 

        Do While Not objFileInput.AtEndOfStream
        strLine = objFileInput.ReadLine
        If Len(strLine) >= iPos Then
            objFileOutput.WriteLine(Left(strLine,iPos-1) & rChar)
        End If

    Loop
    objFileInput.Close
    objFileOutput.Close
    Set objFileInput = Nothing
    Set objFileOutput = Nothing

    Set objSourceFile = objWFSO.GetFile(strSourceFileComplete)
    objSourceFile.Delete
    Set objSourceFile = Nothing

    Set objTargetFile = objWFSO.GetFile(strTargetFileComplete)
    objTargetFile.Move strSourceFileComplete    
    Set objTargetFile = Nothing
    Set objWFSO = Nothing
End If
End Sub

Answer 1

首先用正则表达式模式替换 (\d+)\s+([A-Z]+)(\d+)\s+(\w+) 替换为

并除以 +。那好吧

Live demo

Answer 2

您可以添加一个 regular expression replacement to your input processing loop. Since you want to re-format the columns I'd do it with a replacement function。在全局范围内定义正则表达式和函数：

...
Set pFolder = objFSO.GetFolder(strInputPath)

<b>Set re = New RegExp
re.Pattern = "  ([A-Z]+)(\d+)( +)"

Function ReFormatCol(m, g1, g2, g3, p, s)
  ReFormatCol = Left("  " & Left(g1 & "    ", 7) & g2 & g3, Len(m)+2)
End Function</b>

'loop through the folder and get the file names to be processed
For Each cFile In pFolder.Files
...

并像这样修改输入处理循环：

...
Do While Not objFileInput.AtEndOfStream
  strLine = <b>re.Replace(</b>objFileInput.ReadLine<b>, GetRef("ReFormatCol"))</b>
  If Len(strLine) >= iPos Then
    objFileOutput.WriteLine(Left(strLine,iPos-1) & rChar)
  End If
Loop
...

请注意，您可能需要更改 iPos 值，因为拆分和重新格式化列会使行的长度增加 2 个字符。

回调函数ReFormatCol有以下（必填）参数：

m：正则表达式的匹配（用于判断匹配的长度）
g1、g2、g3：三组来自表达式
p：源字符串中匹配的起始位置（但这里没有使用）
s：源字符串（但这里没有使用）

该函数从 3 组中构建匹配项的替换，如下所示：

Left(g1 & " ", 7) 将 4 个空格附加到第一组（例如 GFS）并将其修剪为 7 个字符。这是基于第一组的长度始终为 3-5 个字符的假设。
→ GFS
" " & ... & g2 & g3 在上述操作的结果前加上 2 个空格，并附加另外 2 组（8978976 & ）。
→ GFS 8978976
Left(..., Len(m)+2) 然后将结果字符串修剪为原始匹配的长度加上 2 个字符（考虑插入额外的 2 个空格以将新的第二列与之前的第二列（现在是第三列）分开） .
→ GFS 8978976

拆分文本文件中的列

Split a column in a text file

regex

vbscript