如何使用 DataTable 和 Linq 优化代码？

Question

我有 2 个 DataTables。大约有 17000 (table1) 和 100000 (table2) 条记录。

需要检查字段"FooName"是否包含[=36=]。还需要取 "FooId" 然后添加 "ItemId" 和 "FooId" 到 ConcurrentDictionary.

我有这个代码。

DataTable table1;
DataTable table2;           
var table1Select = table1.Select();

ConcurrentDictionary<double, double> compareDictionary = new ConcurrentDictionary<double, double>();

foreach (var item in table1)
{
         var fooItem = from foo in table2.AsEnumerable()
                  where foo.Field<string>("FooName").Contains(item.Field<string>("ItemName"))
                  select foo.Field<double>("FooId");
         if(fooItem != null && fooItem.FirstOrDefault() != 0)
                {
                    compareDictionary.TryAdd(item.Field<double>("ItemId"), fooItem.FirstOrDefault());
                }
}

运行缓慢（执行任务大约需要10分钟）。

我想让它更快。我该如何优化它？

Answer 1

我看到你可以攻击三点：

放弃字段访问器上的强类型以支持直接强制转换：它强制拆箱，你可以完全避免 doubles 是值类型。 upd 正如评论中指出的那样，您将无法避免以任何一种方式拆箱，但可能会节省一些方法调用开销（这又是有争议的）。这点大概可以忽略
缓存引用字符串，因此每个外循环只能访问一次
（我认为这是最大的收获）因为你似乎总是得到第一个结果 - 在 LINQ 中直接选择 FirstOrDefault() - 不要让它在找到匹配项时枚举整个事情

ConcurrentDictionary<double, double> compareDictionary = new ConcurrentDictionary<double, double>();

foreach (var item in table1)
    {
        var sample = (string)item["ItemName"]; // cache the value before looping through inner collection
        var fooItem = table2.AsEnumerable()
                            .FirstOrDefault(foo => ((string)foo["FooName"]).Contains(sample)); // you seem to always take First item, so you could instruct LINQ to stop after a match is found
        if (fooItem != null && (double)fooItem["FooId"] != 0)
        {
            compareDictionary.TryAdd((double)item["ItemId"], (double)fooItem["FooId"]);
        }
    }

看来，将 .FirstOrDefault() 条件应用于 LINQ 查询语法无论如何都会 sort of reduce it to method chain syntax，所以我会一直选择方法链接，并留给你来弄清楚美学

Answer 2

如果您愿意为了速度而牺牲内存，从 DataTable 转换为您需要的字段，比反复从 table2 中提取列数据的速度快约 6 倍。（这是对使用 FirstOrDefault 的加速的补充。）

var compareDictionary = new ConcurrentDictionary<double, double>();

var t2e = table2.AsEnumerable().Select(r => (FooName: r.Field<string>("FooName"), FooId: r.Field<double>("FooId"))).ToList();
foreach (var item in table1.AsEnumerable().Select(r => (ItemName: r.Field<string>("ItemName"), ItemId: r.Field<double>("ItemId")))) {
    var firstFooId = t2e.FirstOrDefault(foo => foo.FooName.Contains(item.ItemName)).FooId;

    if (firstFooId != 0.0) {
        compareDictionary.TryAdd(item.ItemId, firstFooId);
    }
}

我正在使用 C# ValueTuples 来避免匿名引用对象的开销类。

如何使用 DataTable 和 Linq 优化代码？

How to optimize a code using DataTable and Linq?

c#

linq

algorithm

datatable

concurrentdictionary