将文件中的 Lf 替换为 CrLf

Replace Lf with CrLf in files

是否有低成本的方法来测试文件中第一行的 LF 终止符而不是 CRLF?

我们从客户那里收到了很多文件,其中一些将 EOL 终结器作为 LF 而不是 CRLF 发送给我们。我们正在使用 SSIS 导入,所以我需要行终止符相同。 (当我在 Notepad++ 中打开文件时,我可以看到行以 LF 而不是 CRLF 结尾)

如果我将文件的第一行读入 StreamReader ReadLine,该行看起来不包含任何类型的终止符。我测试了 line.Contains(vbLf) 和 vbCr 和 vbCrLf,结果都是错误的。

我想我可以将整个文件读入内存并测试 vbLf,但我们收到的一些文件非常大 (25MB),仅检查第一行的行终止符似乎是一种巨大的资源浪费.最坏的情况是我可以用一行 + System.Environment.NewLine 重写我们收到的每个文件中的每一行,但是对于已经使用 CRLF 的文件来说这又是一种浪费。

编辑 下面的最终代码基于@icemanind 的回答(SSIS 脚本任务传入目录变量)

Public Sub Main()
'Gets the directory and a listing of the files and calls the sub

    Dim sPath As String
    sPath = Dts.Variables("User::DataSourceDir").Value.ToString
    Dim sDirectory As String = sPath
    Dim dirList As New DirectoryInfo(sDirectory)
    Dim fileList As FileInfo() = dirList.GetFiles()

    For Each fileName As FileInfo In fileList
        ReplaceBadEol(fileName)
    Next

    Dts.TaskResult = ScriptResults.Success
End Sub

'Temp filename postfix
Private Const fileNamePostFix As String = "_Temp.txt"

'Tests to see if the file has a valid end of line terminator and fixes if it doesn't
Private Sub ReplaceBadEol(currentFileInfo As FileInfo)
    Dim fullName As String = currentFileInfo.FullName
    If FirstLineEndsWithCrLf(fullName) Then Exit Sub
    Dim fileContent As String() = GetFileContent(currentFileInfo.FullName)
    Dim pureFileName As String = Path.GetFileNameWithoutExtension(fullName)
    Dim newFileName As String = Path.Combine(currentFileInfo.DirectoryName, pureFileName & fileNamePostFix)
    File.WriteAllLines(newFileName, fileContent)
    currentFileInfo.Delete()
    File.Move(newFileName, fullName)
End Sub

'Enum to provide info on the return
Private Enum Terminators
    None = 0
    CrLf = 1
    Lf = 2
    Cr = 3
End Enum

'Eol test reads file, advances to the end of the first line and evaluates the value
Private Function GetTerminator(fileName As String, length As Integer) As Terminators
    Using sr As New StreamReader(fileName)
        sr.BaseStream.Seek(length, SeekOrigin.Begin)
        Dim data As Integer = sr.Read()

        While data <> -1
            If data = 13 Then
                data = sr.Read()
                If data = 10 Then
                    Return Terminators.CrLf
                End If
                Return Terminators.Cr
            End If
            If data = 10 Then
                Return Terminators.Lf
            End If
            data = sr.Read()
        End While
    End Using

    Return Terminators.None
End Function

'Checks if file is empty, if not check for EOL terminator
Private Function FirstLineEndsWithCrLf(fileName As String) As Boolean

    Using reader As New System.IO.StreamReader(fileName)
        Dim line As String = reader.ReadLine()
        Dim length As Integer = line.Length
        Dim fileEmpty As Boolean = String.IsNullOrWhiteSpace(line)

        If fileEmpty = True Then
            Return True
        Else
            If GetTerminator(fileName, length) <> 1 Then
                Return False
            End If
            Return True
        End If

    End Using

End Function

'Reads all lines into String Array
Private Function GetFileContent(fileName As String) As String()
    Return File.ReadAllLines(fileName)
End Function

您的线路对 VbCrLf、VbLf 和 VbCr 测试为阴性的原因是因为 ReadLine 剥离了这些。来自 StreamReader.ReadLine 文档:

A line is defined as a sequence of characters followed by a line feed ("\n"), 
a carriage return ("\r"), or a carriage return immediately followed by a line 
feed ("\r\n"). The string that is returned does not contain the terminating 
carriage return or line feed.

如果你想要所有的行,用回车连接 return,试试这个:

Dim lines As String() = File.ReadAllLines("myfile.txt")
Dim data As String = lines.Aggregate(Function(i, j) i + VbCrLf + j)

这将读入文件的所有行,然后使用一些 Linq 将它们与回车符 return 和换行符连接起来。

编辑

如果您只是想确定第一个换行符是什么,试试这个函数:

Private Enum Terminators
    None = 0
    CrLf = 1
    Lf = 2
    Cr = 3
End Enum

Private Shared Function GetTerminator(fileName As String) As Terminators
    Using sr = New StreamReader(fileName)
        Dim data As Integer = sr.Read()

        While data <> -1
            If data = 13 Then
                data = sr.Read()
                If data = 10 Then
                    Return Terminators.CrLf
                End If
                Return Terminators.Cr
            End If
            If data = 10 Then
                Return Terminators.Lf
            End If
            data = sr.Read()
        End While
    End Using

    Return Terminators.None
End Function

只要调用这个函数,传入一个文件名,如果没有行,它将return"Cr"、"Lf"、"CrLf"或"None"终结者。