IEnumerable<T> 是否存储了一个稍后调用的函数?

Does IEnumerable<T> store a function to be called later?

我最近遇到了一些代码,其行为与我预期的不同。

1: int[] numbers = { 1, 2, 3, 4, 5, 6, 7, 8 };
2: IEnumerable<int> result = numbers.Select(n => n % 2 == 0 ? n : 0);
3: 
4: int a = result.ElementAt(0);
5: numbers[0] = 10;
6: int b = result.ElementAt(0);

当我使用 Visual Studio 单步执行这段代码时,我惊讶地发现黄色突出显示从第 4 行跳回到第 2 行的 lambda 表达式,然后又从第 6 行跳到第 2 行的 lambda 2.

另外,运行这段代码后a的值为0,b的值为10。

让我意识到这个 could/would 发生的原始代码涉及 Select() 中的方法调用,并且访问 IEnumerable 的任何 属性 或特定元素导致该方法在 Select() 内被一次又一次地调用。

// The following code prints out:
// Doing something... 1
// Doing something... 5
// Doing something... 1
// Doing something... 2
// Doing something... 3
// Doing something... 4
// Doing something... 5

using System;
using System.Linq;
using System.Collections.Generic;

class Program
{
    static void Main(string[] args)
    {
        int[] numbers = { 1, 2, 3, 4, 5 };
        IEnumerable<int> result = numbers.Select(DoSomething);

        int a = result.ElementAt(0);
        int b = result.ElementAt(4);
        int c = result.Count();
    }

    static int DoSomething(int x)
    {
        Console.WriteLine("Doing something... " + x);
        return x;
    }
}

我觉得我现在明白了代码的行为方式(而且我在网上发现了其他问题是这种行为的结果)。但是,究竟是什么 导致 Select() 中的代码从后面的行中调用?

您有一个 LINQ 查询的引用,它会在您迭代它们时被计算多次。

From the docs(可以看到这个叫延迟执行):

As stated previously, the query variable itself only stores the query commands. The actual execution of the query is deferred until you iterate over the query variable in a foreach statement. This concept is referred to as deferred execution

...

Because the query variable itself never holds the query results, you can execute it as often as you like. For example, you may have a database that is being updated continually by a separate application. In your application, you could create one query that retrieves the latest data, and you could execute it repeatedly at some interval to retrieve different results every time.

所以,当你有

IEnumerable<int> result = numbers.Select(DoSomething);

您有一个查询引用,该查询会将 numbers 中的每个元素转换为 DoSomething 的结果。
所以,你可以这样说:

int a = result.ElementAt(0);

迭代 result 直到第一个元素。 ElementAt(4) 也是如此,但这次它会迭代到第五个元素。请注意,您只看到打印的 Doing something... 5,因为 .Current 被评估了一次。 如果此时查询无法生成 5 个项目,调用将失败。
.Count 调用再次迭代 result 查询和 returns 那一刻的元素数量。

如果您没有保留对查询的引用,而是保留对结果的引用,即:

IEnumerable<int> result = numbers.Select(DoSomething).ToArray();
// or
IEnumerable<int> result = numbers.Select(DoSomething).ToList();

你只会看到这个输出:

// Doing something... 1
// Doing something... 2
// Doing something... 3
// Doing something... 4
// Doing something... 5

Let's break this down piece by piece until you understand it.相信我; take your time and read this and it will be a revelation to you understanding Enumerable types and answer your question.

Look at the IEnumerable interface which is the base of IEnumerable<T>. It contains one method; IEnumerator GetEnumerator();.

Enumerables are a tricky beast because they can do whatever they want. All that really matters is the call to the GetEnumerator() that happens automatically in a foreach loop; or you can do it manually.

What does GetEnumerator() do? It returns another interface, IEnumerator.

This is the magic. The IEnumerator has 1 属性 and 2 methods.

object Current { get; }
bool MoveNext();
void Reset();

Let's break down the magic.

First let me explain what they are typically, and I say typically because like I mentioned it can be a tricky beast. You're allowed to implement this however you choose... Some types don't follow the standards.

object Current { get; } is obvious. It gets the current object in the IEnumerator; by default this might be null.

bool MoveNext(); This returns true if there is another object in the IEnumerator and it should set the Current value to that new object.

void Reset(); tells the type to start over from the beginning.

Now lets implement this. Please take the time to review this IEnumerator type so that you understand it. Realize that when you reference an IEnumerable type you are not even referencing the IEnumerator (this); however, you're referencing a type that returns this IEnumerator via GetEnumerator()

Note: Be careful not to confuse the names. IEnumerator is different than IEnumerable.

IEnumerator

public class MyEnumerator : IEnumerator
{
    private string First => nameof(First);
    private string Second => nameof(Second);
    private string Third => nameof(Third);
    private int counter = 0;

    public object Current { get; private set; }

    public bool MoveNext()
    {
        if (counter > 2) return false;

        counter++;
        switch (counter)
        {
            case 1:
                Current = First;
                break;
            case 2:
                Current = Second;
                break;
            case 3:
                Current = Third;
                break;                    
        }
        return true;
    }

    public void Reset()
    {
        counter = 0;
    }
}

Now, let's make an IEnumerable type and use this IEnumerator.

IEnumerable

public class MyEnumerable : IEnumerable
{
    public IEnumerator GetEnumerator() => new MyEnumerator();
}

This is something to soak in... When you make a call like numbers.Select(n => n % 2 == 0 ? n : 0) you aren't iterating any items... you're returning a type much like the one多于。 .Select(…) returns IEnumerable<int>. Well looky above... IEnumerable isn't anything but an interface that calls GetEnumerator(). That happens whenever you enter a looping situation or it can be done manually. So, with that in mind you can already see the iteration never starts until you call GetEnumerator() and even then it never starts until you call the MoveNext() method of the result of GetEnumerator() which is the IEnumerator type.

So...

In other words, you just have a reference to an IEnumerable<T> in your call and nothing more. No iterations have taken place. This is why the code jumps back up in yours because it finally does iterate in the ElementAt method and it's then looking at the lamba expression. Stay with me and I'll later update an example to take this lesson full circle but for now let's continue our simple example:

Let's now make a simple console app to test our new types.

Console App

class Program
{
    static void Main(string[] args)
    {
        var myEnumerable = new MyEnumerable();

        foreach (var item in myEnumerable)
            Console.WriteLine(item);

        Console.ReadKey();
    }

    // OUTPUT
    // First
    // Second
    // Third
}

Now let's do the same thing but make it generic. I won't write as much but monitor the code closely for changes and you'll get it.

I'm going to copy and paste it all in one.

Entire Console App

using System;
using System.Collections;
using System.Collections.Generic;

namespace Question_Answer_Console_App
{
    class Program
    {
        static void Main(string[] args)
        {
            var myEnumerable = new MyEnumerable<Person>();

            foreach (var person in myEnumerable)
                Console.WriteLine(person.Name);

            Console.ReadKey();
        }

        // OUTPUT
        // Test 0
        // Test 1
        // Test 2
    }

    public class Person
    {
        static int personCounter = 0;
        public string Name { get; } = "Test " + personCounter++;
    }

    public class MyEnumerator<T> : IEnumerator<T>
    {
        private T First { get; set; }
        private T Second { get; set; }
        private T Third { get; set; }
        private int counter = 0;

        object IEnumerator.Current => (IEnumerator<T>)Current;
        public T Current { get; private set; }

        public bool MoveNext()
        {
            if (counter > 2) return false;

            counter++;
            switch (counter)
            {
                case 1:
                    First = Activator.CreateInstance<T>();
                    Current = First;
                    break;
                case 2:
                    Second = Activator.CreateInstance<T>();
                    Current = Second;
                    break;
                case 3:
                    Third = Activator.CreateInstance<T>();
                    Current = Third;
                    break;
            }
            return true;
        }

        public void Reset()
        {
            counter = 0;
            First = default;
            Second = default;
            Third = default;
        }

        public void Dispose() => Reset();
    }

    public class MyEnumerable<T> : IEnumerable<T>
    {
        IEnumerator IEnumerable.GetEnumerator() => GetEnumerator();
        public IEnumerator<T> GetEnumerator() => new MyEnumerator<T>();
    }
}

So let's recap... IEnumerable<T> is a type that has a method that returns an IEnumerator<T> type. The IEnumerator<T> type has the T Current { get; } 属性 as well as the IEnumerator methods.

Let's break this down one more time in code and call out the pieces manually so that you can see it clearer. This will be only the console part of the app because everything else stays the same.

Console App

class Program
{
    static void Main(string[] args)
    {
        IEnumerable<Person> enumerable = new MyEnumerable<Person>();
        IEnumerator<Person> enumerator = enumerable.GetEnumerator();

        while (enumerator.MoveNext())
            Console.WriteLine(enumerator.Current.Name);

        Console.ReadKey();
    }
    // OUTPUT
    // Test 0
    // Test 1
    // Test 2
}

FYI: One thing to point out is in the answer above there are two versions of Linq. Linq in EF or Linq-to-SQL contain different extension methods than typical linq. The main difference is that query expression in Linq (when referring to a database) will return IQueryable<T> which implements the IQueryable interface, which creates SQL expressions that are 运行 and iterated against. In other words... something like a .Where(…) clause doesn't query the entire database and then iterate over it. It turns that expression into a SQL expression. That's why things like .Equals() will not work in those specific Lambda expressions.

Does IEnumerable<T> store a function to be called later?

是的。 IEnumerable 正是它所说的那样。它是 可以 在未来某个时间点被枚举的东西。您可以将其视为设置操作管道。

直到它实际被枚举(即调用 foreach.ElementAt()ToList() 等),这些操作中的任何一个才被实际调用。这叫做deferred execution.

what exactly causes the code within the Select() to be called from later lines?

当您调用 SomeEnumerable.Select(SomeOperation) 时,结果是一个 IEnumerable,它是一个表示您已设置的 "pipeline" 的对象。该 IEnumerable 的实现确实存储了您传递给它的函数。这个(.net 核心)的实际来源是 here。你可以看到 SelectEnumerableIteratorSelectListIteratorSelectArrayIterator 都有一个 Func<TSource, TResult> 作为私有字段。这是它存储您指定供以后使用的功能的地方。如果您知道要遍历有限集合,数组和列表迭代器只是提供了一些快捷方式。