在 .NET 中从 IEnumerable 中创建列表的最快方法
Fastest way to make list out of IEnumerable in .NET
在用户选择目标参数后,我尝试在 WPF 数据图中显示大量值。为实现这一点,我使用了 WPF 的 livecharts(像这样:https://lvcharts.net/App/examples/v1/Wpf/Scrollable)并且运行良好。要更改图表中的值,我必须调用此函数:
' Change values of xAxis
Private Sub ChangeXAxis(axis As Object, title As String, values As Object)
axis.Labels = values ' has to be array or list of values (strings for X, double for Y)
axis.Title = title
End Sub
所选值必须在显示前按时间戳或参数过滤。为此,我使用以下函数:
Public Function FilterListForChart(values As IEnumerable, axis As String, counterStart As Integer, counterEnd As Integer)
...
Try
If axis = "X" Then
'Return filtered values for axis
Dim query As IEnumerable = (From rows In values
Where CInt(rows(3)) > counterStart And CInt(rows(3)) < counterEnd
Select (Math.Round(CDbl(rows(1)), 3))).ToList()
Return query
不幸的是,lvcharts 需要值作为列表或数组(字符串或双精度)才能正确显示数据。问题是如果我想显示很多值,将 IEnumerable 转换为列表或数组将花费很长时间(例如,>300.000 个值需要 10 秒或更长时间)
正因为如此,我尝试了很多不同的事情,就像这里讨论的那样:
- Why IEnumerable slow and List is fast?
我尝试了以下选项但没有成功:
- 使用 ToArray()
- 使用 ToList()
- 使用定义容量的分配列表
- 使用每个循环而不是查询
- 不进行舍入过滤(使用缩减数据处理)
在我的测试中,我得到了以下处理时间:
- 从总共 180.000 个值中筛选出 180.000 个值并在图表中显示:X = 1088ms,Y = 1085ms,总计:2.173ms
- 从 1.800.000 个值中筛选出 180.000 个值并显示在图表中:X = 9919ms,Y = 9983ms,总计:19.902ms
查询和过滤数据只需几毫秒。大部分计算时间用于创建 list/array。在我的测试中,我只能将计算时间减少几毫秒。
我目前的解决方案是使用值 <200.000 来显示数据并使用 backgroundworker 加载剩余数据并稍后更新 gui。但这不是一个好的解决方案。用户必须查看图表中的所有值才能评估数据,如果用户想要向图表添加一些参数,则必须重新加载这些值。在之前的 SQL 查询期间过滤数据不是一个好的选择,因为持续时间相似。
更新:
我又做了五个场景的测试:
Dim yAxis1 = TestSpeed_ForEach(values_YAxis, "Y", counterStart, counterEnd, decimalCut, False)
Dim yAxis2 = TestSpeed_ForEachFixedList(values_YAxis, "Y", counterStart, counterEnd, decimalCut, False)
Dim yAxis3 = TestSpeed_QueryToArray(values_YAxis, "Y", counterStart, counterEnd, decimalCut, False)
Dim yAxis4 = TestSpeed_QueryToList(values_YAxis, "Y", counterStart, counterEnd, decimalCut, False)
Dim yAxis5 = TestSpeed_QueryToListParallel(values_YAxis, "Y", counterStart, counterEnd, decimalCut, False)
我用秒表测试了从 2.300 --> 1.800.000 的不同值的功能。如前所述,我无法大大加快计算时间。固定列表的函数是最快的,但节省了 50 - 400 毫秒。以下是总计 1.800.000 个值中的 27.500 个值的查询结果:
- 时间:Y 轴 1 - 10868 毫秒
- 时间:YAxis 2 - 10844ms
- 时间:Y 轴 3 - 11311 毫秒
- 时间:YAxis 4 - 11265 毫秒
- 时间:Y 轴 5 - 11313 毫秒
在第二种情况下,我用 30.000 或 500.000 个条目测试了常量列表。但这只有很小的影响。
这里是用到的函数:
'################ TESTING ################
Public Function TestSpeed_ForEach(values As IEnumerable, axis As String, counterStart As Integer, counterEnd As Integer, decimalCut As Integer, isTimeAxis As Boolean)
Try
Dim yList As New List(Of Double)
For Each row In values
If CInt(row(3)) > counterStart And CInt(row(3)) < counterEnd Then
yList.Add(Math.Round(CDbl(row.ItemArray(4)), decimalCut))
End If
Next
Return yList
Catch ex As Exception
Return Nothing
End Try
End Function
Public Function TestSpeed_ForEachFixedList(values As IEnumerable, axis As String, counterStart As Integer, counterEnd As Integer, decimalCut As Integer, xIsTimeList As Boolean)
Try
Const capacity As Integer = 30000
Dim yList As New List(Of Double)(capacity)
For Each row In values
If CInt(row(3)) > counterStart And CInt(row(3)) < counterEnd Then
yList.Add(Math.Round(CDbl(row.ItemArray(4)), decimalCut))
End If
Next
Return yList
Catch ex As Exception
Return Nothing
End Try
End Function
Public Function TestSpeed_QueryToList(values As IEnumerable, axis As String, counterStart As Integer, counterEnd As Integer, decimalCut As Integer, isTimeAxis As Boolean)
Try
Dim query As IEnumerable = (From rows In values
Where CInt(rows(3)) > counterStart And CInt(rows(3)) < counterEnd
Select (Math.Round(CDbl(rows(4)), decimalCut))).ToList()
Return query
Catch ex As Exception
Return Nothing
End Try
End Function
Public Function TestSpeed_QueryToArray(values As IEnumerable, axis As String, counterStart As Integer, counterEnd As Integer, decimalCut As Integer, isTimeAxis As Boolean)
Try
Dim query As IEnumerable = (From rows In values
Where CInt(rows(3)) > counterStart And CInt(rows(3)) < counterEnd
Select (Math.Round(CDbl(rows(4)), decimalCut))).ToArray()
Return query
Catch ex As Exception
Return Nothing
End Try
End Function
Public Function TestSpeed_QueryToListParallel(values As IEnumerable, axis As String, counterStart As Integer, counterEnd As Integer, decimalCut As Integer, isTimeAxis As Boolean)
Try
Dim query As IEnumerable = (From rows In values
Where CInt(rows(3)) > counterStart And CInt(rows(3)) < counterEnd
Select (Math.Round(CDbl(rows(4)), decimalCut))).AsParallel.ToList()
Return query
Catch ex As Exception
Return Nothing
End Try
End Function
我还能做些什么来加快速度?
我同意这些评论。这应该更快 ASSUMING 你的函数 returns 双列表...
Const capacity As Integer = 1024 * 512
Dim query As New List(Of Double)(capacity)
For Each rows In values
If CInt(rows(3)) > counterStart AndAlso CInt(rows(3)) < counterEnd Then
query.Add(Math.Round(CDbl(rows(1)), 3))
End If
Next
Return query
您必须对此进行编辑并更正其中有 ???重点是测试传递给方法的 IEnumerable。
Public Function TestSpeed_ForEachFixedList(values As IEnumerable(Of ???),
axis As String,
counterStart As Integer,
counterEnd As Integer,
decimalCut As Integer,
xIsTimeList As Boolean) As List(Of Double)
Dim Nvals As List(Of ???) = values.ToList
Try
Const capacity As Integer = 300000
Dim yList As New List(Of Double)(capacity)
For Each row As ??? In Nvals
If CInt(row(3)) > counterStart And CInt(row(3)) < counterEnd Then
yList.Add(Math.Round(CDbl(row.ItemArray(4)), decimalCut))
End If
Next
Return yList
Catch ex As Exception
Return Nothing
End Try
End Function
经过多次测试,我找到了计算时间最短的解决方案。
Try
If axis = "X" Then
Dim query = (From rows In values.AsParallel
Where CInt(rows(3)) > counterStart And CInt(rows(3)) < counterEnd
Order By CDbl(rows.ItemArray(3)) Ascending
Select (String.Format("{0:f" & decimalCut & "}", rows.ItemArray(positionToAsk)))).ToList
Return query
ElseIf axis = "Y" Then
Dim query = (From rows In values.AsParallel
Where CInt(rows(3)) > counterStart And CInt(rows(3)) < counterEnd
Order By CDbl(rows.ItemArray(3)) Ascending
Select (Math.Round(CDbl(rows.ItemArray(4)), decimalCut))).ToList
Return query
End If
Catch ex As Exception
Return Nothing
End Try
重要的发现是:
- 之前的 LINQ 查询必须使用隐式声明(没有
Dim query as IEnumeration...
)(感谢 @djv)
- 在 LINQ 查询中使用 .AsParallel 以使用多核
- 如果要使用列表,应该事先给它们分配一个大小(.Capacity)
- 在某些情况下,For Each 循环比 LINQ 查询快一点
优化后的查询可以在 890 毫秒内从 1.800.000 个值中检索出 27.500 个值(之前是 11.241 毫秒)
在用户选择目标参数后,我尝试在 WPF 数据图中显示大量值。为实现这一点,我使用了 WPF 的 livecharts(像这样:https://lvcharts.net/App/examples/v1/Wpf/Scrollable)并且运行良好。要更改图表中的值,我必须调用此函数:
' Change values of xAxis
Private Sub ChangeXAxis(axis As Object, title As String, values As Object)
axis.Labels = values ' has to be array or list of values (strings for X, double for Y)
axis.Title = title
End Sub
所选值必须在显示前按时间戳或参数过滤。为此,我使用以下函数:
Public Function FilterListForChart(values As IEnumerable, axis As String, counterStart As Integer, counterEnd As Integer)
...
Try
If axis = "X" Then
'Return filtered values for axis
Dim query As IEnumerable = (From rows In values
Where CInt(rows(3)) > counterStart And CInt(rows(3)) < counterEnd
Select (Math.Round(CDbl(rows(1)), 3))).ToList()
Return query
不幸的是,lvcharts 需要值作为列表或数组(字符串或双精度)才能正确显示数据。问题是如果我想显示很多值,将 IEnumerable 转换为列表或数组将花费很长时间(例如,>300.000 个值需要 10 秒或更长时间)
正因为如此,我尝试了很多不同的事情,就像这里讨论的那样:
- Why IEnumerable slow and List is fast?
我尝试了以下选项但没有成功:
- 使用 ToArray()
- 使用 ToList()
- 使用定义容量的分配列表
- 使用每个循环而不是查询
- 不进行舍入过滤(使用缩减数据处理)
在我的测试中,我得到了以下处理时间:
- 从总共 180.000 个值中筛选出 180.000 个值并在图表中显示:X = 1088ms,Y = 1085ms,总计:2.173ms
- 从 1.800.000 个值中筛选出 180.000 个值并显示在图表中:X = 9919ms,Y = 9983ms,总计:19.902ms
查询和过滤数据只需几毫秒。大部分计算时间用于创建 list/array。在我的测试中,我只能将计算时间减少几毫秒。
我目前的解决方案是使用值 <200.000 来显示数据并使用 backgroundworker 加载剩余数据并稍后更新 gui。但这不是一个好的解决方案。用户必须查看图表中的所有值才能评估数据,如果用户想要向图表添加一些参数,则必须重新加载这些值。在之前的 SQL 查询期间过滤数据不是一个好的选择,因为持续时间相似。
更新:
我又做了五个场景的测试:
Dim yAxis1 = TestSpeed_ForEach(values_YAxis, "Y", counterStart, counterEnd, decimalCut, False)
Dim yAxis2 = TestSpeed_ForEachFixedList(values_YAxis, "Y", counterStart, counterEnd, decimalCut, False)
Dim yAxis3 = TestSpeed_QueryToArray(values_YAxis, "Y", counterStart, counterEnd, decimalCut, False)
Dim yAxis4 = TestSpeed_QueryToList(values_YAxis, "Y", counterStart, counterEnd, decimalCut, False)
Dim yAxis5 = TestSpeed_QueryToListParallel(values_YAxis, "Y", counterStart, counterEnd, decimalCut, False)
我用秒表测试了从 2.300 --> 1.800.000 的不同值的功能。如前所述,我无法大大加快计算时间。固定列表的函数是最快的,但节省了 50 - 400 毫秒。以下是总计 1.800.000 个值中的 27.500 个值的查询结果:
- 时间:Y 轴 1 - 10868 毫秒
- 时间:YAxis 2 - 10844ms
- 时间:Y 轴 3 - 11311 毫秒
- 时间:YAxis 4 - 11265 毫秒
- 时间:Y 轴 5 - 11313 毫秒
在第二种情况下,我用 30.000 或 500.000 个条目测试了常量列表。但这只有很小的影响。
这里是用到的函数:
'################ TESTING ################
Public Function TestSpeed_ForEach(values As IEnumerable, axis As String, counterStart As Integer, counterEnd As Integer, decimalCut As Integer, isTimeAxis As Boolean)
Try
Dim yList As New List(Of Double)
For Each row In values
If CInt(row(3)) > counterStart And CInt(row(3)) < counterEnd Then
yList.Add(Math.Round(CDbl(row.ItemArray(4)), decimalCut))
End If
Next
Return yList
Catch ex As Exception
Return Nothing
End Try
End Function
Public Function TestSpeed_ForEachFixedList(values As IEnumerable, axis As String, counterStart As Integer, counterEnd As Integer, decimalCut As Integer, xIsTimeList As Boolean)
Try
Const capacity As Integer = 30000
Dim yList As New List(Of Double)(capacity)
For Each row In values
If CInt(row(3)) > counterStart And CInt(row(3)) < counterEnd Then
yList.Add(Math.Round(CDbl(row.ItemArray(4)), decimalCut))
End If
Next
Return yList
Catch ex As Exception
Return Nothing
End Try
End Function
Public Function TestSpeed_QueryToList(values As IEnumerable, axis As String, counterStart As Integer, counterEnd As Integer, decimalCut As Integer, isTimeAxis As Boolean)
Try
Dim query As IEnumerable = (From rows In values
Where CInt(rows(3)) > counterStart And CInt(rows(3)) < counterEnd
Select (Math.Round(CDbl(rows(4)), decimalCut))).ToList()
Return query
Catch ex As Exception
Return Nothing
End Try
End Function
Public Function TestSpeed_QueryToArray(values As IEnumerable, axis As String, counterStart As Integer, counterEnd As Integer, decimalCut As Integer, isTimeAxis As Boolean)
Try
Dim query As IEnumerable = (From rows In values
Where CInt(rows(3)) > counterStart And CInt(rows(3)) < counterEnd
Select (Math.Round(CDbl(rows(4)), decimalCut))).ToArray()
Return query
Catch ex As Exception
Return Nothing
End Try
End Function
Public Function TestSpeed_QueryToListParallel(values As IEnumerable, axis As String, counterStart As Integer, counterEnd As Integer, decimalCut As Integer, isTimeAxis As Boolean)
Try
Dim query As IEnumerable = (From rows In values
Where CInt(rows(3)) > counterStart And CInt(rows(3)) < counterEnd
Select (Math.Round(CDbl(rows(4)), decimalCut))).AsParallel.ToList()
Return query
Catch ex As Exception
Return Nothing
End Try
End Function
我还能做些什么来加快速度?
我同意这些评论。这应该更快 ASSUMING 你的函数 returns 双列表...
Const capacity As Integer = 1024 * 512
Dim query As New List(Of Double)(capacity)
For Each rows In values
If CInt(rows(3)) > counterStart AndAlso CInt(rows(3)) < counterEnd Then
query.Add(Math.Round(CDbl(rows(1)), 3))
End If
Next
Return query
您必须对此进行编辑并更正其中有 ???重点是测试传递给方法的 IEnumerable。
Public Function TestSpeed_ForEachFixedList(values As IEnumerable(Of ???),
axis As String,
counterStart As Integer,
counterEnd As Integer,
decimalCut As Integer,
xIsTimeList As Boolean) As List(Of Double)
Dim Nvals As List(Of ???) = values.ToList
Try
Const capacity As Integer = 300000
Dim yList As New List(Of Double)(capacity)
For Each row As ??? In Nvals
If CInt(row(3)) > counterStart And CInt(row(3)) < counterEnd Then
yList.Add(Math.Round(CDbl(row.ItemArray(4)), decimalCut))
End If
Next
Return yList
Catch ex As Exception
Return Nothing
End Try
End Function
经过多次测试,我找到了计算时间最短的解决方案。
Try
If axis = "X" Then
Dim query = (From rows In values.AsParallel
Where CInt(rows(3)) > counterStart And CInt(rows(3)) < counterEnd
Order By CDbl(rows.ItemArray(3)) Ascending
Select (String.Format("{0:f" & decimalCut & "}", rows.ItemArray(positionToAsk)))).ToList
Return query
ElseIf axis = "Y" Then
Dim query = (From rows In values.AsParallel
Where CInt(rows(3)) > counterStart And CInt(rows(3)) < counterEnd
Order By CDbl(rows.ItemArray(3)) Ascending
Select (Math.Round(CDbl(rows.ItemArray(4)), decimalCut))).ToList
Return query
End If
Catch ex As Exception
Return Nothing
End Try
重要的发现是:
- 之前的 LINQ 查询必须使用隐式声明(没有
Dim query as IEnumeration...
)(感谢 @djv) - 在 LINQ 查询中使用 .AsParallel 以使用多核
- 如果要使用列表,应该事先给它们分配一个大小(.Capacity)
- 在某些情况下,For Each 循环比 LINQ 查询快一点
优化后的查询可以在 890 毫秒内从 1.800.000 个值中检索出 27.500 个值(之前是 11.241 毫秒)