在 .NET 中从 IEnumerable 中创建列表的最快方法

Fastest way to make list out of IEnumerable in .NET

在用户选择目标参数后,我尝试在 WPF 数据图中显示大量值。为实现这一点,我使用了 WPF 的 livecharts(像这样:https://lvcharts.net/App/examples/v1/Wpf/Scrollable)并且运行良好。要更改图表中的值,我必须调用此函数:

' Change values of xAxis
Private Sub ChangeXAxis(axis As Object, title As String, values As Object)
    axis.Labels = values  ' has to be array or list of values (strings for X, double for Y)
    axis.Title = title
End Sub

所选值必须在显示前按时间戳或参数过滤。为此,我使用以下函数:

    Public Function FilterListForChart(values As IEnumerable, axis As String, counterStart As Integer, counterEnd As Integer)
    ...

    Try
        If axis = "X" Then
            'Return filtered values for axis
            Dim query As IEnumerable = (From rows In values
                                        Where CInt(rows(3)) > counterStart And CInt(rows(3)) < counterEnd
                                        Select (Math.Round(CDbl(rows(1)), 3))).ToList()
            
            Return query

不幸的是,lvcharts 需要值作为列表或数组(字符串或双精度)才能正确显示数据。问题是如果我想显示很多值,将 IEnumerable 转换为列表或数组将花费很长时间(例如,>300.000 个值需要 10 秒或更长时间)

正因为如此,我尝试了很多不同的事情,就像这里讨论的那样:

我尝试了以下选项但没有成功:

在我的测试中,我得到了以下处理时间:

查询和过滤数据只需几毫秒。大部分计算时间用于创建 list/array。在我的测试中,我只能将计算时间减少几毫秒。

我目前的解决方案是使用值 <200.000 来显示数据并使用 backgroundworker 加载剩余数据并稍后更新 gui。但这不是一个好的解决方案。用户必须查看图表中的所有值才能评估数据,如果用户想要向图表添加一些参数,则必须重新加载这些值。在之前的 SQL 查询期间过滤数据不是一个好的选择,因为持续时间相似。

更新:

我又做了五个场景的测试:

  Dim yAxis1 = TestSpeed_ForEach(values_YAxis, "Y", counterStart, counterEnd, decimalCut, False)
  Dim yAxis2 = TestSpeed_ForEachFixedList(values_YAxis, "Y", counterStart, counterEnd, decimalCut, False)
  Dim yAxis3 = TestSpeed_QueryToArray(values_YAxis, "Y", counterStart, counterEnd, decimalCut, False)
  Dim yAxis4 = TestSpeed_QueryToList(values_YAxis, "Y", counterStart, counterEnd, decimalCut, False)
  Dim yAxis5 = TestSpeed_QueryToListParallel(values_YAxis, "Y", counterStart, counterEnd, decimalCut, False)

我用秒表测试了从 2.300 --> 1.800.000 的不同值的功能。如前所述,我无法大大加快计算时间。固定列表的函数是最快的,但节省了 50 - 400 毫秒。以下是总计 1.800.000 个值中的 27.500 个值的查询结果:

在第二种情况下,我用 30.000 或 500.000 个条目测试了常量列表。但这只有很小的影响。

这里是用到的函数:

'################ TESTING ################
Public Function TestSpeed_ForEach(values As IEnumerable, axis As String, counterStart As Integer, counterEnd As Integer, decimalCut As Integer, isTimeAxis As Boolean)
    Try
        Dim yList As New List(Of Double)

        For Each row In values
            If CInt(row(3)) > counterStart And CInt(row(3)) < counterEnd Then
                yList.Add(Math.Round(CDbl(row.ItemArray(4)), decimalCut))
            End If
        Next
        Return yList

    Catch ex As Exception
        Return Nothing
    End Try
End Function

Public Function TestSpeed_ForEachFixedList(values As IEnumerable, axis As String, counterStart As Integer, counterEnd As Integer, decimalCut As Integer, xIsTimeList As Boolean)
    Try
        Const capacity As Integer = 30000
        Dim yList As New List(Of Double)(capacity)

        For Each row In values
            If CInt(row(3)) > counterStart And CInt(row(3)) < counterEnd Then
                yList.Add(Math.Round(CDbl(row.ItemArray(4)), decimalCut))
            End If
        Next
        Return yList

    Catch ex As Exception
        Return Nothing
    End Try
End Function

Public Function TestSpeed_QueryToList(values As IEnumerable, axis As String, counterStart As Integer, counterEnd As Integer, decimalCut As Integer, isTimeAxis As Boolean)
    Try
        Dim query As IEnumerable = (From rows In values
                                    Where CInt(rows(3)) > counterStart And CInt(rows(3)) < counterEnd
                                    Select (Math.Round(CDbl(rows(4)), decimalCut))).ToList()
        Return query

    Catch ex As Exception
        Return Nothing
    End Try
End Function

Public Function TestSpeed_QueryToArray(values As IEnumerable, axis As String, counterStart As Integer, counterEnd As Integer, decimalCut As Integer, isTimeAxis As Boolean)
    Try
        Dim query As IEnumerable = (From rows In values
                                    Where CInt(rows(3)) > counterStart And CInt(rows(3)) < counterEnd
                                    Select (Math.Round(CDbl(rows(4)), decimalCut))).ToArray()
        Return query

    Catch ex As Exception
        Return Nothing
    End Try
End Function

Public Function TestSpeed_QueryToListParallel(values As IEnumerable, axis As String, counterStart As Integer, counterEnd As Integer, decimalCut As Integer, isTimeAxis As Boolean)
    Try
        Dim query As IEnumerable = (From rows In values
                                    Where CInt(rows(3)) > counterStart And CInt(rows(3)) < counterEnd
                                    Select (Math.Round(CDbl(rows(4)), decimalCut))).AsParallel.ToList()
        Return query

    Catch ex As Exception
        Return Nothing
    End Try
End Function

我还能做些什么来加快速度?

我同意这些评论。这应该更快 ASSUMING 你的函数 returns 双列表...

    Const capacity As Integer = 1024 * 512

    Dim query As New List(Of Double)(capacity)

    For Each rows In values
        If CInt(rows(3)) > counterStart AndAlso CInt(rows(3)) < counterEnd Then
            query.Add(Math.Round(CDbl(rows(1)), 3))
        End If
    Next

    Return query

您必须对此进行编辑并更正其中有 ???重点是测试传递给方法的 IEnumerable。

Public Function TestSpeed_ForEachFixedList(values As IEnumerable(Of ???),
                                            axis As String,
                                            counterStart As Integer,
                                            counterEnd As Integer,
                                            decimalCut As Integer,
                                            xIsTimeList As Boolean) As List(Of Double)

    Dim Nvals As List(Of ???) = values.ToList
    Try
        Const capacity As Integer = 300000
        Dim yList As New List(Of Double)(capacity)

        For Each row As ??? In Nvals
            If CInt(row(3)) > counterStart And CInt(row(3)) < counterEnd Then
                yList.Add(Math.Round(CDbl(row.ItemArray(4)), decimalCut))
            End If
        Next
        Return yList

    Catch ex As Exception
        Return Nothing
    End Try
End Function

经过多次测试,我找到了计算时间最短的解决方案。

    Try
        If axis = "X" Then
            Dim query = (From rows In values.AsParallel
                         Where CInt(rows(3)) > counterStart And CInt(rows(3)) < counterEnd
                         Order By CDbl(rows.ItemArray(3)) Ascending
                         Select (String.Format("{0:f" & decimalCut & "}", rows.ItemArray(positionToAsk)))).ToList

            Return query

        ElseIf axis = "Y" Then
            Dim query = (From rows In values.AsParallel
                         Where CInt(rows(3)) > counterStart And CInt(rows(3)) < counterEnd
                         Order By CDbl(rows.ItemArray(3)) Ascending
                         Select (Math.Round(CDbl(rows.ItemArray(4)), decimalCut))).ToList

            Return query
        End If
    Catch ex As Exception
        Return Nothing
    End Try

重要的发现是:

  • 之前的 LINQ 查询必须使用隐式声明(没有 Dim query as IEnumeration...)(感谢 @djv)
  • 在 LINQ 查询中使用 .AsParallel 以使用多核
  • 如果要使用列表,应该事先给它们分配一个大小(.Capacity)
  • 在某些情况下,For Each 循环比 LINQ 查询快一点

优化后的查询可以在 890 毫秒内从 1.800.000 个值中检索出 27.500 个值(之前是 11.241 毫秒)