查询列表与 ToDictionary()
Querying List vs ToDictionary()
想象一下您有一个需要搜索的相当大的列表的情况。什么时候将其转换为 Dictionary 以及什么时候只查询列表?
我知道查询列表是一个 O(n) 操作,而查询字典(或 Lookup/Hashset)是一个 O(1)。但是,我完全不确定将任何 O(n) 集合转换为 O(1) 集合的效率。该转换的效率不是 O(n) 本身吗?这是否意味着将 List 转换为 Dictionary 是完全没有意义的,除非您至少查询它 3 次?
在我们讨论的过程中,您在决定某个特定系列时的思考过程是怎样的?您认为什么是最佳做法?
例如(用我的phone写的,无视语法)
public class BigData
{
public int Id;
public SubBigData SubBigData;
}
//elsewhere...
public SubBigData GetDataById(int id)
{
List<BigData> data = GetDataFromSomewhere();
return data.Where(d => d.Id == id).Select(d => d.SubBigData).FirstOrDefault();
//vs
return data.ToDictionary(d => d.Id, d.SubBigData)[id];
}
如果我能在得到结果之前过滤它,那就太理想了。因此,如果 GetDataFromSomewhere
可以查询,请在那里查询您的数据。
总结
我做了一个测试以显示实际结果:
On a collection of 1 000 000 records, get 1 000 records by ID
您示例中的 Linq
表达式用了 5.7
秒。
一个简化的 Linq
表达式用了 6.6
秒。
词典转换和检索花费了 26
秒。
有趣的是,如果您将整个集合转换为词典并将其用作新来源,则需要 2.7
秒才能获得 1000 条记录。
所以,这里的字典转换更快。 不过,我不确定这是否是一个充分的测试。
代码
using System.Diagnostics;
#region Create and populate collection
List<TestObject> listCollection = new();
for (int i = 0; i < 1000000; i++)
{
listCollection.Add(new TestObject()
{
Id = i,
Name = $"Name{i}"
});
}
#endregion Create and populate collection
Stopwatch stopwatch = Stopwatch.StartNew();
Random random = new();
#region Get using Linq
for (int i = 0; i < 1000; i++)
{
int id = random.Next(0, 2000000);
_ = listCollection.Where(c => c.Id == id).Select(c => c.Name).FirstOrDefault();
}
Console.WriteLine($"Took {stopwatch.Elapsed} to get 1000 items from list"); //Took 00:00:05.7212037
#endregion Get using Linq
#region Get using Linq (Shorter Expression, takes longer to execute)
stopwatch.Restart();
for (int i = 0; i < 1000; i++)
{
int id = random.Next(0, 2000000);
_ = listCollection.FirstOrDefault(c => c.Id == id)?.Name;
}
Console.WriteLine($"Took {stopwatch.Elapsed} to get 1000 items from list"); //Took 00:00:06.6697507
#endregion Get using Linq (Shorter Expression, takes longer to execute)
#region Get using Dictionary
stopwatch.Restart();
for (int i = 0; i < 1000; i++)
{
try
{
int id = random.Next(0, 2000000);
_ = listCollection.ToDictionary(c => c.Id, c => c.Name)[id];
}
catch {
//Ignore errors when trying to get a value from the Dictionary that does not have a matching key
}
}
Console.WriteLine($"Took {stopwatch.Elapsed} to get 1000 items from dictionary"); //Took 00:00:26.0257818
#endregion Get using Dictionary
#region Get using cached Dictionary
stopwatch.Restart();
var cachedDictionary = listCollection.ToDictionary(c => c.Id, c => c.Name);
for (int i = 0; i < 1000; i++)
{
try
{
int id = random.Next(0, 2000000);
_ = cachedDictionary[id];
}
catch
{
//Ignore errors when trying to get a value from the Dictionary that does not have a matching key
}
}
Console.WriteLine($"Took {stopwatch.Elapsed} to get 1000 items from cached dictionary"); //Took 00:00:02.7144331
#endregion Get using cached Dictionary
stopwatch.Stop();
public class TestObject
{
public int Id { get; set; }
public string Name { get; set; }
}
想象一下您有一个需要搜索的相当大的列表的情况。什么时候将其转换为 Dictionary
我知道查询列表是一个 O(n) 操作,而查询字典(或 Lookup/Hashset)是一个 O(1)。但是,我完全不确定将任何 O(n) 集合转换为 O(1) 集合的效率。该转换的效率不是 O(n) 本身吗?这是否意味着将 List 转换为 Dictionary 是完全没有意义的,除非您至少查询它 3 次?
在我们讨论的过程中,您在决定某个特定系列时的思考过程是怎样的?您认为什么是最佳做法?
例如(用我的phone写的,无视语法)
public class BigData
{
public int Id;
public SubBigData SubBigData;
}
//elsewhere...
public SubBigData GetDataById(int id)
{
List<BigData> data = GetDataFromSomewhere();
return data.Where(d => d.Id == id).Select(d => d.SubBigData).FirstOrDefault();
//vs
return data.ToDictionary(d => d.Id, d.SubBigData)[id];
}
如果我能在得到结果之前过滤它,那就太理想了。因此,如果 GetDataFromSomewhere
可以查询,请在那里查询您的数据。
总结
我做了一个测试以显示实际结果:
On a collection of 1 000 000 records, get 1 000 records by ID
您示例中的 Linq
表达式用了 5.7
秒。
一个简化的 Linq
表达式用了 6.6
秒。
词典转换和检索花费了 26
秒。
有趣的是,如果您将整个集合转换为词典并将其用作新来源,则需要 2.7
秒才能获得 1000 条记录。
所以,这里的字典转换更快。 不过,我不确定这是否是一个充分的测试。
代码
using System.Diagnostics;
#region Create and populate collection
List<TestObject> listCollection = new();
for (int i = 0; i < 1000000; i++)
{
listCollection.Add(new TestObject()
{
Id = i,
Name = $"Name{i}"
});
}
#endregion Create and populate collection
Stopwatch stopwatch = Stopwatch.StartNew();
Random random = new();
#region Get using Linq
for (int i = 0; i < 1000; i++)
{
int id = random.Next(0, 2000000);
_ = listCollection.Where(c => c.Id == id).Select(c => c.Name).FirstOrDefault();
}
Console.WriteLine($"Took {stopwatch.Elapsed} to get 1000 items from list"); //Took 00:00:05.7212037
#endregion Get using Linq
#region Get using Linq (Shorter Expression, takes longer to execute)
stopwatch.Restart();
for (int i = 0; i < 1000; i++)
{
int id = random.Next(0, 2000000);
_ = listCollection.FirstOrDefault(c => c.Id == id)?.Name;
}
Console.WriteLine($"Took {stopwatch.Elapsed} to get 1000 items from list"); //Took 00:00:06.6697507
#endregion Get using Linq (Shorter Expression, takes longer to execute)
#region Get using Dictionary
stopwatch.Restart();
for (int i = 0; i < 1000; i++)
{
try
{
int id = random.Next(0, 2000000);
_ = listCollection.ToDictionary(c => c.Id, c => c.Name)[id];
}
catch {
//Ignore errors when trying to get a value from the Dictionary that does not have a matching key
}
}
Console.WriteLine($"Took {stopwatch.Elapsed} to get 1000 items from dictionary"); //Took 00:00:26.0257818
#endregion Get using Dictionary
#region Get using cached Dictionary
stopwatch.Restart();
var cachedDictionary = listCollection.ToDictionary(c => c.Id, c => c.Name);
for (int i = 0; i < 1000; i++)
{
try
{
int id = random.Next(0, 2000000);
_ = cachedDictionary[id];
}
catch
{
//Ignore errors when trying to get a value from the Dictionary that does not have a matching key
}
}
Console.WriteLine($"Took {stopwatch.Elapsed} to get 1000 items from cached dictionary"); //Took 00:00:02.7144331
#endregion Get using cached Dictionary
stopwatch.Stop();
public class TestObject
{
public int Id { get; set; }
public string Name { get; set; }
}