MoreLINQ 的 DistinctBy 和 Linq 的 GroupBy 有什么区别

Question

我有两个按项目列表分组的版本

List<m_addtlallowsetup> xlist_distincted = xlist_addtlallowsetups.DistinctBy(p => new { p.setupcode, p.allowcode }).OrderBy(y => y.setupcode).ThenBy(z => z.allowcode).ToList();

和groupby

List <m_addtlallowsetup>  grouped = xlist_addtlallowsetups.GroupBy(p => new { p.setupcode, p.allowcode }).Select(grp => grp.First()).OrderBy(y => y.setupcode).ThenBy(z => z.allowcode).ToList();

这两个在我看来是一样的，但对于它们的区别、性能和缺点，必须有外行人的解释

Answer 1

差异

GroupBy 应该产生一个 'group'，其中包含键（分组标准）及其值。这就是为什么你需要先做 Select(grp => grp.First())。

您可能怀疑 MoreLinq 只提供了 shorthand。 the source, the DistinctBy is actually done in memory by picking every single item that is new for the HashSet. The HashSet#Add 的 MoreLinq 将添加项，如果它是 HashSet 的新元素，return 为真，则 yield 会将新添加的元素 return 放入可枚举。

哪一个？

SQL相关

基于上面的区别，你可以说做 GroupBy 然后用 Select 投影它是更安全的方法，因为如果你正在使用它可以翻译成 SQL 命令Entity Framework（或者我想是 Linq2Sql）。能够被翻译成 SQL 命令是一个很大的优势，可以减轻应用程序的负担并将操作委托给数据库服务器。

但是，您必须了解 Entity Framework 中的 GroupBy 实际上使用了 OUTER JOIN，这被认为是复杂的操作，在某些情况下，它可能会导致您的查询立即被删除。这是非常罕见的情况，即使我抛出的查询也有很多列，使用了大约四个 GroupBys，一堆排序和 Wheres.

Linq to Object

粗略地说，在处理已经在内存中的枚举时。运行 GroupBy 然后 Select 可能最终需要通过两个操作迭代您的可枚举需求。虽然直接使用 MoreLinq 中的 DistinctBy 可以节省一些好处，因为它保证是一个由 HashSet 支持的单一操作，如 Mrinal Kamboj 对源代码的深入分析所解释的那样。

Answer 2

先回顾一下MoreLinqAPI，下面是DistinctBy的代码：

MoreLinq - DistinctBy

Source Code

public static IEnumerable<TSource> DistinctBy<TSource, TKey>(this IEnumerable<TSource> source,
            Func<TSource, TKey> keySelector, IEqualityComparer<TKey> comparer)
        {
            if (source == null) throw new ArgumentNullException(nameof(source));
            if (keySelector == null) throw new ArgumentNullException(nameof(keySelector));

            return _(); IEnumerable<TSource> _()
            {
                var knownKeys = new HashSet<TKey>(comparer);
                foreach (var element in source)
                {
                    if (knownKeys.Add(keySelector(element)))
                        yield return element;
                }
            }
       }

工作

在内部使用 HashSet<T> 它只检查第一个匹配项和 returns Type T 匹配键的第一个元素，其余的都被忽略，因为键已经添加到哈希集
获取属于集合中每个唯一键入项的第一个元素的最简单方法，如 Func<TSource, TKey> keySelector
用例有限（GroupBy 可以实现的子集，也从您的代码中清楚）

可枚举 - GroupBy

(Source Code)

public static IEnumerable<IGrouping<TKey, TElement>> GroupBy<TSource, TKey, TElement>(this IEnumerable<TSource> source, Func<TSource, TKey> keySelector, Func<TSource, TElement> elementSelector) {
            return new GroupedEnumerable<TSource, TKey, TElement>(source, keySelector, elementSelector, null);
        }

 internal class GroupedEnumerable<TSource, TKey, TElement> : IEnumerable<IGrouping<TKey, TElement>>
    {
        IEnumerable<TSource> source;
        Func<TSource, TKey> keySelector;
        Func<TSource, TElement> elementSelector;
        IEqualityComparer<TKey> comparer;

        public GroupedEnumerable(IEnumerable<TSource> source, Func<TSource, TKey> keySelector, Func<TSource, TElement> elementSelector, IEqualityComparer<TKey> comparer) {
            if (source == null) throw Error.ArgumentNull("source");
            if (keySelector == null) throw Error.ArgumentNull("keySelector");
            if (elementSelector == null) throw Error.ArgumentNull("elementSelector");
            this.source = source;
            this.keySelector = keySelector;
            this.elementSelector = elementSelector;
            this.comparer = comparer;
        }

        public IEnumerator<IGrouping<TKey, TElement>> GetEnumerator() {
            return Lookup<TKey, TElement>.Create<TSource>(source, keySelector, elementSelector, comparer).GetEnumerator();
        }

        IEnumerator IEnumerable.GetEnumerator() {
            return GetEnumerator();
        }
    }

工作

可以看出，内部使用了一个LookUp数据结构来对给定Key
通过投影为元素和结果提供灵活性selection，从而能够满足许多不同的用例

总结

MoreLinq - DistinctBy 实现了 Enumerable - GroupBy 可以实现的一小部分。如果您的用例是特定的，请使用更多 Linq API
对于您的用例，速度明智，因为范围有限 MoreLinq - DistinctBy 会更快，因为与 Enumerable - GroupBy 不同，DistinctBy 不会先聚合所有数据然后 select 首先对于每个唯一的键，MoreLinq API 只是忽略第一条记录之后的数据
如果要求是特定用例并且不需要数据投影，那么 MoreLinq 是更好的选择。

这是 Linq 中的一个经典案例，多个 API 可以提供相同的结果，但我们需要警惕成本因素，因为这里的 GroupBy 是为更广泛的任务而设计的比你对 DistinctBy

的期望

MoreLINQ 的 DistinctBy 和 Linq 的 GroupBy 有什么区别

What is the difference between MoreLINQ's DistinctBy and Linq's GroupBy

c#

linq

morelinq

差异

哪一个？

SQL相关

Linq to Object

MoreLinq - DistinctBy

工作

可枚举 - GroupBy

工作

总结