查询列表与 ToDictionary()

Querying List vs ToDictionary()

想象一下您有一个需要搜索的相当大的列表的情况。什么时候将其转换为 Dictionary 以及什么时候只查询列表?

我知道查询列表是一个 O(n) 操作,而查询字典(或 Lookup/Hashset)是一个 O(1)。但是,我完全不确定将任何 O(n) 集合转换为 O(1) 集合的效率。该转换的效率不是 O(n) 本身吗?这是否意味着将 List 转换为 Dictionary 是完全没有意义的,除非您至少查询它 3 次?

在我们讨论的过程中,您在决定某个特定系列时的思考过程是怎样的?您认为什么是最佳做法?

例如(用我的phone写的,无视语法)

public class BigData
{
  public int Id;
  public SubBigData SubBigData;
}

//elsewhere...
public SubBigData GetDataById(int id) 
{
  List<BigData> data = GetDataFromSomewhere();

  return data.Where(d => d.Id == id).Select(d => d.SubBigData).FirstOrDefault();
  //vs
  return data.ToDictionary(d => d.Id, d.SubBigData)[id];
}

如果我能在得到结果之前过滤它,那就太理想了。因此,如果 GetDataFromSomewhere 可以查询,请在那里查询您的数据。

总结

我做了一个测试以显示实际结果:

On a collection of 1 000 000 records, get 1 000 records by ID

您示例中的 Linq 表达式用了 5.7 秒。

一个简化的 Linq 表达式用了 6.6 秒。

词典转换和检索花费了 26 秒。

有趣的是,如果您将整个集合转换为词典并将其用作新来源,则需要 2.7 秒才能获得 1000 条记录。

所以,这里的字典转换更快。 不过,我不确定这是否是一个充分的测试。

代码


using System.Diagnostics;

#region Create and populate collection
List<TestObject> listCollection = new();

for (int i = 0; i < 1000000; i++)
{
    listCollection.Add(new TestObject()
    {
        Id = i,
        Name = $"Name{i}"
    });
}
#endregion Create and populate collection

Stopwatch stopwatch = Stopwatch.StartNew();

Random random = new();

#region Get using Linq
for (int i = 0; i < 1000; i++)
{
    int id = random.Next(0, 2000000);
    _ = listCollection.Where(c => c.Id == id).Select(c => c.Name).FirstOrDefault();
}

Console.WriteLine($"Took {stopwatch.Elapsed} to get 1000 items from list"); //Took 00:00:05.7212037
#endregion Get using Linq

#region Get using Linq (Shorter Expression, takes longer to execute)
stopwatch.Restart();

for (int i = 0; i < 1000; i++)
{
    int id = random.Next(0, 2000000);
    _ = listCollection.FirstOrDefault(c => c.Id == id)?.Name;
}

Console.WriteLine($"Took {stopwatch.Elapsed} to get 1000 items from list"); //Took 00:00:06.6697507
#endregion Get using Linq (Shorter Expression, takes longer to execute)

#region Get using Dictionary
stopwatch.Restart();

for (int i = 0; i < 1000; i++)
{
    try
    {
        int id = random.Next(0, 2000000);
        _ = listCollection.ToDictionary(c => c.Id, c => c.Name)[id];
    }
    catch {
        //Ignore errors when trying to get a value from the Dictionary that does not have a matching key
    }
}
Console.WriteLine($"Took {stopwatch.Elapsed} to get 1000 items from dictionary"); //Took 00:00:26.0257818
#endregion Get using Dictionary

#region Get using cached Dictionary
stopwatch.Restart();

var cachedDictionary = listCollection.ToDictionary(c => c.Id, c => c.Name);
for (int i = 0; i < 1000; i++)
{
    try
    {
        int id = random.Next(0, 2000000);
        _ = cachedDictionary[id];
    }
    catch
    {
        //Ignore errors when trying to get a value from the Dictionary that does not have a matching key
    }
}

Console.WriteLine($"Took {stopwatch.Elapsed} to get 1000 items from cached dictionary"); //Took 00:00:02.7144331
#endregion Get using cached Dictionary

stopwatch.Stop();


public class TestObject
{
    public int Id { get; set; }
    public string Name { get; set; }
}