.Net 使用 Parallel.ForEach 处理大文件不符合顺序
.Net Processing large files using Parallel.ForEach is not keeping order
我正在编写一段 vb.net 代码来逐行处理一个大型文本文件,该文件有 19 列,由引号分隔,后跟逗号和另一个引号(即“,”)。我根据分隔符值 (",") 拆分了行。然后我寻找每列 (col) 仍然包含的任何额外双引号并记录到数据库 table,如果找到的话。我能够处理它,但问题是订单没有得到维护。我需要写入数据库 table 哪些行和列包含分隔字符(以供进一步处理)。似乎循环无法捕捉到订单。
****例如****:如果输入文件包含 10 行和每行 19 列,并且在第 1 行的最后一列(第 19 列)和第 10 行的最后一列发现无效分隔字符,则日志记录 (WriteLogToDb(Path.GetFileName(FileAlign.Common.InputFileName), rownum, colnum,) 是这样发生的:row1 - 正确打印。但是,而不是说第 10 行包含错误的列值,parallel.foreach 循环表示第 5 行在第 19 个位置有错误的列。就列号而言,该顺序似乎是正确的。
我在这里做错了什么?还有其他选择吗?
示例文件 -->
"col1val"","col2val"",""","cccccc"","xxxxxx"","xxxx"","12334""," 331"",35344535"","xxx"","WA"","50000"",""","03/01/2000"",""",""","" ",""",""Lin1Col19"
行 2-->
第 3 行->
- - ETC
第 9 行-->
"col1val"","col2val"",""","aaaa"","xxxxxx"","xxxx"","4242"","6464"",533535353 "","xxx"","PA"","6446enter code here
"",""","04/01/1967"",""",""","" ",""",""Lin1Col19"
这是代码示例。
Public Sub ValidateExtraDoubleQuotes(FileName As String)
Dim InputFile As String = FileName
Dim rownum As Integer = 0
Dim colnum As Integer = 0
Dim SplittedValues() As String
Dim delimiter As String = """,""" '/*ie delimiter is ","*/
Dim QT As String = """" 'escape single doublequote by adding another
Dim ExtraQTFound As Boolean = False
Dim QTRowCount As Long = 0
Dim messagesLockRow As New Object
Dim messagesLockCol As New Object
Try
Parallel.ForEach(File.ReadLines(InputFile), Sub(line As String)
Console.WriteLine(line)
SyncLock messagesLockRow
rownum += 1
'/*' remove first and last chars from each line for further processing(ie. extra double quotes) *
line = (line.Remove(0, 1)).Remove(line.Length - 2, 1)
SplittedValues = line.Split(New String() {delimiter}, StringSplitOptions.None)
SyncLock messagesLockCol
For Each Str As String In SplittedValues
colnum += 1
If Str.Contains(QT) Then
ExtraQTFound = True
WriteLogToDb(Path.GetFileName(FileAlign.Common.InputFileName), rownum, colnum, False, "Extra Double Quotes for-->" & Str)
End If
Next
End SyncLock
colnum = 0
ExtraQTFound = False
End SyncLock
End Sub)
Catch ex As Exception
Console.Write(String.Concat("Exception!!", ex.Message.ToString()))
End Try
End Sub
使用 Parallel.ForEach,您可以通过将工作分配给不同的线程并在执行完成后合并它们的输出来命令您的代码同时处理多行代码。这样做时您不能保证顺序。仅当顺序无关紧要或您有自己的执行后数据重新排序方式时才应使用并行性。
就备选方案而言。您可以做一些事情,例如将每一行读取到一个数组中,然后在您将数据放在一个可以维持顺序的对象中之后,使用 parallel.ForEach 来完成您的处理工作。此时,如果出现错误,则可以将错误消息与数组索引一起传递给对象,并让它在所有处理发生后根据数组索引顺序写入数据库:link 将显示你如何在 parallel.foreach 循环中使用数组索引 http://www.blackwasp.co.uk/ParallelForEachIndex.aspx
我正在编写一段 vb.net 代码来逐行处理一个大型文本文件,该文件有 19 列,由引号分隔,后跟逗号和另一个引号(即“,”)。我根据分隔符值 (",") 拆分了行。然后我寻找每列 (col) 仍然包含的任何额外双引号并记录到数据库 table,如果找到的话。我能够处理它,但问题是订单没有得到维护。我需要写入数据库 table 哪些行和列包含分隔字符(以供进一步处理)。似乎循环无法捕捉到订单。
****例如****:如果输入文件包含 10 行和每行 19 列,并且在第 1 行的最后一列(第 19 列)和第 10 行的最后一列发现无效分隔字符,则日志记录 (WriteLogToDb(Path.GetFileName(FileAlign.Common.InputFileName), rownum, colnum,) 是这样发生的:row1 - 正确打印。但是,而不是说第 10 行包含错误的列值,parallel.foreach 循环表示第 5 行在第 19 个位置有错误的列。就列号而言,该顺序似乎是正确的。
我在这里做错了什么?还有其他选择吗?
示例文件 -->
"col1val"","col2val"",""","cccccc"","xxxxxx"","xxxx"","12334""," 331"",35344535"","xxx"","WA"","50000"",""","03/01/2000"",""",""","" ",""",""Lin1Col19"
行 2-->
第 3 行->
- - ETC
第 9 行-->
"col1val"","col2val"",""","aaaa"","xxxxxx"","xxxx"","4242"","6464"",533535353 "","xxx"","PA"","6446enter code here
"",""","04/01/1967"",""",""","" ",""",""Lin1Col19"
这是代码示例。
Public Sub ValidateExtraDoubleQuotes(FileName As String)
Dim InputFile As String = FileName
Dim rownum As Integer = 0
Dim colnum As Integer = 0
Dim SplittedValues() As String
Dim delimiter As String = """,""" '/*ie delimiter is ","*/
Dim QT As String = """" 'escape single doublequote by adding another
Dim ExtraQTFound As Boolean = False
Dim QTRowCount As Long = 0
Dim messagesLockRow As New Object
Dim messagesLockCol As New Object
Try
Parallel.ForEach(File.ReadLines(InputFile), Sub(line As String)
Console.WriteLine(line)
SyncLock messagesLockRow
rownum += 1
'/*' remove first and last chars from each line for further processing(ie. extra double quotes) *
line = (line.Remove(0, 1)).Remove(line.Length - 2, 1)
SplittedValues = line.Split(New String() {delimiter}, StringSplitOptions.None)
SyncLock messagesLockCol
For Each Str As String In SplittedValues
colnum += 1
If Str.Contains(QT) Then
ExtraQTFound = True
WriteLogToDb(Path.GetFileName(FileAlign.Common.InputFileName), rownum, colnum, False, "Extra Double Quotes for-->" & Str)
End If
Next
End SyncLock
colnum = 0
ExtraQTFound = False
End SyncLock
End Sub)
Catch ex As Exception
Console.Write(String.Concat("Exception!!", ex.Message.ToString()))
End Try
End Sub
使用 Parallel.ForEach,您可以通过将工作分配给不同的线程并在执行完成后合并它们的输出来命令您的代码同时处理多行代码。这样做时您不能保证顺序。仅当顺序无关紧要或您有自己的执行后数据重新排序方式时才应使用并行性。
就备选方案而言。您可以做一些事情,例如将每一行读取到一个数组中,然后在您将数据放在一个可以维持顺序的对象中之后,使用 parallel.ForEach 来完成您的处理工作。此时,如果出现错误,则可以将错误消息与数组索引一起传递给对象,并让它在所有处理发生后根据数组索引顺序写入数据库:link 将显示你如何在 parallel.foreach 循环中使用数组索引 http://www.blackwasp.co.uk/ParallelForEachIndex.aspx