指针类型数组上的 foreach 的闭包语义

Closure semantics for foreach over arrays of pointer types

在 C# 5 中,foreach 语句的闭包语义(当匿名函数的迭代变量为 "captured" 或 "closed over" 时)为 famously changed (link to thread on that topic)

问题:是否也打算为指针类型的数组更改此设置?

我问的原因是foreach语句的"expansion"由于技术原因必须重写(我们不能使用Current属性 System.Collections.IEnumerator 因为这个 属性 声明了与指针类型不兼容的类型 object)与其他集合的 foreach 相比。 C# 语言规范中的相关部分 "Pointer arrays",在 version 5.0 中说:

foreach (V v in x) EMBEDDED-STATEMENT

扩展为:

{
  T[,,…,] a = x;
  V v;
  for (int i0 = a.GetLowerBound(0); i0 <= a.GetUpperBound(0); i0++)
  for (int i1 = a.GetLowerBound(1); i1 <= a.GetUpperBound(1); i1++)
  …
  for (int in = a.GetLowerBound(N); iN <= a.GetUpperBound(n); iN++) {
    v = (V)a.GetValue(i0,i1,…,iN);
    EMBEDDED-STATEMENT
  }
}

我们注意到声明 V v; 位于所有 for 循环之外。所以看起来闭包语义仍然是 "old" C# 4 风格,"loop variable is reused, loop variable is "outer" 相对于循环"。

为了弄清楚我在说什么,请考虑这个完整的 C# 5 程序:

using System;
using System.Collections.Generic;

static class Program
{
  unsafe static void Main()
  {
    char* zeroCharPointer = null;
    char*[] arrayOfPointers =
      { zeroCharPointer, zeroCharPointer + 1, zeroCharPointer + 2, zeroCharPointer + 100, };

    var list = new List<Action>();

    // foreach through pointer array, capture each foreach variable 'pointer' in a lambda
    foreach (var pointer in arrayOfPointers)
      list.Add(() => Console.WriteLine("Pointer address is {0:X2}.", (long)pointer));

    Console.WriteLine("List complete");
    // invoke those delegates
    foreach (var act in list)
      act();
  }

  // Possible output:
  //
  // List complete
  // Pointer address is 00.
  // Pointer address is 02.
  // Pointer address is 04.
  // Pointer address is C8.
  //
  // Or:
  //
  // List complete
  // Pointer address is C8.
  // Pointer address is C8.
  // Pointer address is C8.
  // Pointer address is C8.
}

那么上面程序的正确输出是什么?

我想规范在这部分(关于指针数组)没有更新以反映 V 变量也进入内部范围。如果使用 C# 5 编译器编译您的示例并查看输出 - 它看起来像规范(使用数组访问而不是 GetValue,因为您在注释中正确指出),除了 V 变量将在所有 for 循环内。输出将是 00-02-04-C8,但当然你自己知道所有这些:)

长话短说 - 当然我不知道这是否是故意的,但我的猜测是它打算将变量移动到所有 foreach 循环的内部范围,包括指针数组,并且规范不是已更新以反映这一点。

我已经联系了 C# 语言 PM Mads Torgersen,他们似乎只是忘记更新规范的这一部分。 His exact answer 是(我问为什么规范没有更新):

because I forgot! :-) I now have in latest draft, and submitted to ECMA. Thanks!

看来 C#-5 的行为对于指针数组也是相同的,这就是为什么您会看到第一个输出,这是正确的。

以下代码编译 (C# 5.0) 为给定的 IL 代码 (代码中的注释):

.method private hidebysig static void Main() cil managed
{
    .entrypoint
    .maxstack 6
    .locals init (
        [0] char* chPtr,
        [1] char*[] chPtrArray,
        [2] class [mscorlib]System.Collections.Generic.List`1<class [mscorlib]System.Action> list,
        [3] char*[] chPtrArray2,
        [4] int32 num,
        [5] class ConsoleTests.Program/<>c__DisplayClass0_0 class_,
        [6] valuetype [mscorlib]System.Collections.Generic.List`1/Enumerator`0<class [mscorlib]System.Action> enumerator,
        [7] class [mscorlib]System.Action action)
    L_0000: nop 
    L_0001: ldc.i4.0 //{{{{{
    L_0002: conv.u  //chPtr = null;
    L_0003: stloc.0 //}}}}}
    L_0004: ldc.i4.4 //{{{{{
    L_0005: newarr char* //Creates a new char*[4]}}}}}
    L_000a: dup //{{{{{
    L_000b: ldc.i4.0 // Sets the first element in the new
    L_000c: ldloc.0 // char*[] to chPtr.
    L_000d: stelem.i //}}}}}
    L_000e: dup //{{{{{
    L_000f: ldc.i4.1 //
    L_0010: ldloc.0 // Sets the second element of the
    L_0011: ldc.i4.2 // char*[] to chPtr + 1 
    L_0012: add // (loads 2 instead of 1 because char is UTF-16)
    L_0013: stelem.i //}}}}}
    L_0014: dup //{{{{{
    L_0015: ldc.i4.2 // 
    L_0016: ldloc.0 //
    L_0017: ldc.i4.2 // Sets the third element of the
    L_0018: conv.i // char*[] to chPtr + 2
    L_0019: ldc.i4.2 // (loads 4 instead of 2 because char is UTF-16)
    L_001a: mul //
    L_001b: add //
    L_001c: stelem.i //}}}}}
    L_001d: dup //{{{{{
    L_001e: ldc.i4.3 //
    L_001f: ldloc.0 //
    L_0020: ldc.i4.s 100 // Sets the third element of the
    L_0022: conv.i // char*[] to chPtr + 100
    L_0023: ldc.i4.2 // (loads 200 instead of 100 because char is UTF-16)
    L_0024: mul //
    L_0025: add //
    L_0026: stelem.i // }}}}}
    L_0027: stloc.1 // chPtrArray = the new array that we have just filled.
    L_0028: newobj instance void [mscorlib]System.Collections.Generic.List`1<class [mscorlib]System.Action>::.ctor() //{{{{{
    L_002d: stloc.2 // list = new List<Action>()
    L_002e: nop //}}}}}
    L_002f: ldloc.1 //{{{{{
    L_0030: stloc.3 //chPtrArray2 = chPtrArray}}}}}
    L_0031: ldc.i4.0 //for (int num = 0; num < 3; num++)
    L_0032: stloc.s num //
    L_0034: br.s L_0062 //<<<<< (for start)
    L_0036: newobj instance void ConsoleTests.Program/<>c__DisplayClass0_0::.ctor() //{{{{{
    L_003b: stloc.s class_ //class_ = new temporary compile-time class
    L_003d: ldloc.s class_ //}}}}}
    L_003f: ldloc.3 //{{{{{
    L_0040: ldloc.s num //
    L_0042: ldelem.i //
    L_0043: stfld char* ConsoleTests.Program/<>c__DisplayClass0_0::pointer //class_.pointer = chPtrArray2[num]}}}}}
    L_0048: ldloc.2 //{{{{{
    L_0049: ldloc.s class_ //
    L_004b: ldftn instance void ConsoleTests.Program/<>c__DisplayClass0_0::<Main>b__0() // list.Add(class_.<Main>b__0);
    L_0051: newobj instance void [mscorlib]System.Action::.ctor(object, native int) // (Adds the temporary compile-time class action, which has the correct pointer since
    L_0056: callvirt instance void [mscorlib]System.Collections.Generic.List`1<class [mscorlib]System.Action>::Add(!0) //it is a specific class instace for this iteration, to the list)}}}}}
    L_005b: nop 
    L_005c: ldloc.s num //practically the end of the for
    L_005e: ldc.i4.1 // (actually increasing num and comparing)
    L_005f: add //
    L_0060: stloc.s num //
    L_0062: ldloc.s num //
    L_0064: ldloc.3 //
    L_0065: ldlen //
    L_0066: conv.i4 //
    L_0067: blt.s L_0036 //>>>>> (for complete)
    L_0069: ldstr "List complete" //Printing and stuff.....
    L_006e: call void [mscorlib]System.Console::WriteLine(string)
    L_0073: nop 
    L_0074: nop 
    L_0075: ldloc.2 
    L_0076: callvirt instance valuetype [mscorlib]System.Collections.Generic.List`1/Enumerator`0<!0> [mscorlib]System.Collections.Generic.List`1<class [mscorlib]System.Action>::GetEnumerator()
    L_007b: stloc.s enumerator
    L_007d: br.s L_0090
    L_007f: ldloca.s enumerator
    L_0081: call instance !0 [mscorlib]System.Collections.Generic.List`1/Enumerator`0<class [mscorlib]System.Action>::get_Current()
    L_0086: stloc.s action
    L_0088: ldloc.s action
    L_008a: callvirt instance void [mscorlib]System.Action::Invoke()
    L_008f: nop 
    L_0090: ldloca.s enumerator
    L_0092: call instance bool [mscorlib]System.Collections.Generic.List`1/Enumerator`0<class [mscorlib]System.Action>::MoveNext()
    L_0097: brtrue.s L_007f
    L_0099: leave.s L_00aa
    L_009b: ldloca.s enumerator
    L_009d: constrained. [mscorlib]System.Collections.Generic.List`1/Enumerator`0<class [mscorlib]System.Action>
    L_00a3: callvirt instance void [mscorlib]System.IDisposable::Dispose()
    L_00a8: nop 
    L_00a9: endfinally 
    L_00aa: ret 
    .try L_007d to L_009b finally handler L_009b to L_00aa
}

如您所见,a class 是在编译时生成的 ,称为 <>c__DisplayClass0_0,其中包含您的 Action 和一个值char* 个。 class 看起来像这样:

[CompilerGenerated]
private sealed class <>c__DisplayClass0_0
{
    // Fields
    public unsafe char* pointer;

    // Methods
    internal unsafe void <Main>b__0()
    {
        Console.WriteLine("Pointer address is {0:X2}.", (long) ((ulong) this.pointer));
    }
}

MSIL代码中我们可以看到foreach被编译成下面的for循环:

shallowCloneOfArray = arrayOfPointers;
for (int num = 0; num < arrayOfPointers.Length; num++)
{
    <>c__DisplayClass0_0 temp = new <>c__DisplayClass0_0();
    temp.pointer = shallowCloneOfArray[num];
    list.Add(temp.<Main>b__0); //Adds the action to the list of actions
}

什么意思指针的值实际上是在迭代循环并创建委托时复制的,所以指针 的值 是将被打印的值(a.k.a:每个动作都来自它自己的 <>c__DisplayClass0_0 实例,并将接收其临时克隆指针)。

正如我们刚刚看到的,foreach 之前的 "reused variable" 是数组本身,这意味着引用的指针不会被重用,这意味着如果规范如您所说,比他们错了,因为你附加的规范建议输出应该是 00 00 00 00。结果:

List complete
Pointer address is 00.
Pointer address is 02.
Pointer address is 04.
Pointer address is C8.