指针类型数组上的 foreach 的闭包语义
Closure semantics for foreach over arrays of pointer types
在 C# 5 中,foreach
语句的闭包语义(当匿名函数的迭代变量为 "captured" 或 "closed over" 时)为 famously changed (link to thread on that topic)。
问题:是否也打算为指针类型的数组更改此设置?
我问的原因是foreach
语句的"expansion"由于技术原因必须重写(我们不能使用Current
属性 System.Collections.IEnumerator
因为这个 属性 声明了与指针类型不兼容的类型 object
)与其他集合的 foreach
相比。 C# 语言规范中的相关部分 "Pointer arrays",在 version 5.0 中说:
foreach (V v in x) EMBEDDED-STATEMENT
扩展为:
{
T[,,…,] a = x;
V v;
for (int i0 = a.GetLowerBound(0); i0 <= a.GetUpperBound(0); i0++)
for (int i1 = a.GetLowerBound(1); i1 <= a.GetUpperBound(1); i1++)
…
for (int in = a.GetLowerBound(N); iN <= a.GetUpperBound(n); iN++) {
v = (V)a.GetValue(i0,i1,…,iN);
EMBEDDED-STATEMENT
}
}
我们注意到声明 V v;
位于所有 for
循环之外。所以看起来闭包语义仍然是 "old" C# 4 风格,"loop variable is reused, loop variable is "outer" 相对于循环"。
为了弄清楚我在说什么,请考虑这个完整的 C# 5 程序:
using System;
using System.Collections.Generic;
static class Program
{
unsafe static void Main()
{
char* zeroCharPointer = null;
char*[] arrayOfPointers =
{ zeroCharPointer, zeroCharPointer + 1, zeroCharPointer + 2, zeroCharPointer + 100, };
var list = new List<Action>();
// foreach through pointer array, capture each foreach variable 'pointer' in a lambda
foreach (var pointer in arrayOfPointers)
list.Add(() => Console.WriteLine("Pointer address is {0:X2}.", (long)pointer));
Console.WriteLine("List complete");
// invoke those delegates
foreach (var act in list)
act();
}
// Possible output:
//
// List complete
// Pointer address is 00.
// Pointer address is 02.
// Pointer address is 04.
// Pointer address is C8.
//
// Or:
//
// List complete
// Pointer address is C8.
// Pointer address is C8.
// Pointer address is C8.
// Pointer address is C8.
}
那么上面程序的正确输出是什么?
我想规范在这部分(关于指针数组)没有更新以反映 V 变量也进入内部范围。如果使用 C# 5 编译器编译您的示例并查看输出 - 它看起来像规范(使用数组访问而不是 GetValue,因为您在注释中正确指出),除了 V 变量将在所有 for 循环内。输出将是 00-02-04-C8,但当然你自己知道所有这些:)
长话短说 - 当然我不知道这是否是故意的,但我的猜测是它打算将变量移动到所有 foreach 循环的内部范围,包括指针数组,并且规范不是已更新以反映这一点。
我已经联系了 C# 语言 PM Mads Torgersen,他们似乎只是忘记更新规范的这一部分。 His exact answer 是(我问为什么规范没有更新):
because I forgot! :-) I now have in latest draft, and submitted to ECMA. Thanks!
看来 C#-5 的行为对于指针数组也是相同的,这就是为什么您会看到第一个输出,这是正确的。
以下代码编译 (C# 5.0) 为给定的 IL 代码 (代码中的注释):
.method private hidebysig static void Main() cil managed
{
.entrypoint
.maxstack 6
.locals init (
[0] char* chPtr,
[1] char*[] chPtrArray,
[2] class [mscorlib]System.Collections.Generic.List`1<class [mscorlib]System.Action> list,
[3] char*[] chPtrArray2,
[4] int32 num,
[5] class ConsoleTests.Program/<>c__DisplayClass0_0 class_,
[6] valuetype [mscorlib]System.Collections.Generic.List`1/Enumerator`0<class [mscorlib]System.Action> enumerator,
[7] class [mscorlib]System.Action action)
L_0000: nop
L_0001: ldc.i4.0 //{{{{{
L_0002: conv.u //chPtr = null;
L_0003: stloc.0 //}}}}}
L_0004: ldc.i4.4 //{{{{{
L_0005: newarr char* //Creates a new char*[4]}}}}}
L_000a: dup //{{{{{
L_000b: ldc.i4.0 // Sets the first element in the new
L_000c: ldloc.0 // char*[] to chPtr.
L_000d: stelem.i //}}}}}
L_000e: dup //{{{{{
L_000f: ldc.i4.1 //
L_0010: ldloc.0 // Sets the second element of the
L_0011: ldc.i4.2 // char*[] to chPtr + 1
L_0012: add // (loads 2 instead of 1 because char is UTF-16)
L_0013: stelem.i //}}}}}
L_0014: dup //{{{{{
L_0015: ldc.i4.2 //
L_0016: ldloc.0 //
L_0017: ldc.i4.2 // Sets the third element of the
L_0018: conv.i // char*[] to chPtr + 2
L_0019: ldc.i4.2 // (loads 4 instead of 2 because char is UTF-16)
L_001a: mul //
L_001b: add //
L_001c: stelem.i //}}}}}
L_001d: dup //{{{{{
L_001e: ldc.i4.3 //
L_001f: ldloc.0 //
L_0020: ldc.i4.s 100 // Sets the third element of the
L_0022: conv.i // char*[] to chPtr + 100
L_0023: ldc.i4.2 // (loads 200 instead of 100 because char is UTF-16)
L_0024: mul //
L_0025: add //
L_0026: stelem.i // }}}}}
L_0027: stloc.1 // chPtrArray = the new array that we have just filled.
L_0028: newobj instance void [mscorlib]System.Collections.Generic.List`1<class [mscorlib]System.Action>::.ctor() //{{{{{
L_002d: stloc.2 // list = new List<Action>()
L_002e: nop //}}}}}
L_002f: ldloc.1 //{{{{{
L_0030: stloc.3 //chPtrArray2 = chPtrArray}}}}}
L_0031: ldc.i4.0 //for (int num = 0; num < 3; num++)
L_0032: stloc.s num //
L_0034: br.s L_0062 //<<<<< (for start)
L_0036: newobj instance void ConsoleTests.Program/<>c__DisplayClass0_0::.ctor() //{{{{{
L_003b: stloc.s class_ //class_ = new temporary compile-time class
L_003d: ldloc.s class_ //}}}}}
L_003f: ldloc.3 //{{{{{
L_0040: ldloc.s num //
L_0042: ldelem.i //
L_0043: stfld char* ConsoleTests.Program/<>c__DisplayClass0_0::pointer //class_.pointer = chPtrArray2[num]}}}}}
L_0048: ldloc.2 //{{{{{
L_0049: ldloc.s class_ //
L_004b: ldftn instance void ConsoleTests.Program/<>c__DisplayClass0_0::<Main>b__0() // list.Add(class_.<Main>b__0);
L_0051: newobj instance void [mscorlib]System.Action::.ctor(object, native int) // (Adds the temporary compile-time class action, which has the correct pointer since
L_0056: callvirt instance void [mscorlib]System.Collections.Generic.List`1<class [mscorlib]System.Action>::Add(!0) //it is a specific class instace for this iteration, to the list)}}}}}
L_005b: nop
L_005c: ldloc.s num //practically the end of the for
L_005e: ldc.i4.1 // (actually increasing num and comparing)
L_005f: add //
L_0060: stloc.s num //
L_0062: ldloc.s num //
L_0064: ldloc.3 //
L_0065: ldlen //
L_0066: conv.i4 //
L_0067: blt.s L_0036 //>>>>> (for complete)
L_0069: ldstr "List complete" //Printing and stuff.....
L_006e: call void [mscorlib]System.Console::WriteLine(string)
L_0073: nop
L_0074: nop
L_0075: ldloc.2
L_0076: callvirt instance valuetype [mscorlib]System.Collections.Generic.List`1/Enumerator`0<!0> [mscorlib]System.Collections.Generic.List`1<class [mscorlib]System.Action>::GetEnumerator()
L_007b: stloc.s enumerator
L_007d: br.s L_0090
L_007f: ldloca.s enumerator
L_0081: call instance !0 [mscorlib]System.Collections.Generic.List`1/Enumerator`0<class [mscorlib]System.Action>::get_Current()
L_0086: stloc.s action
L_0088: ldloc.s action
L_008a: callvirt instance void [mscorlib]System.Action::Invoke()
L_008f: nop
L_0090: ldloca.s enumerator
L_0092: call instance bool [mscorlib]System.Collections.Generic.List`1/Enumerator`0<class [mscorlib]System.Action>::MoveNext()
L_0097: brtrue.s L_007f
L_0099: leave.s L_00aa
L_009b: ldloca.s enumerator
L_009d: constrained. [mscorlib]System.Collections.Generic.List`1/Enumerator`0<class [mscorlib]System.Action>
L_00a3: callvirt instance void [mscorlib]System.IDisposable::Dispose()
L_00a8: nop
L_00a9: endfinally
L_00aa: ret
.try L_007d to L_009b finally handler L_009b to L_00aa
}
如您所见,a class 是在编译时生成的 ,称为 <>c__DisplayClass0_0
,其中包含您的 Action
和一个值char*
个。 class 看起来像这样:
[CompilerGenerated]
private sealed class <>c__DisplayClass0_0
{
// Fields
public unsafe char* pointer;
// Methods
internal unsafe void <Main>b__0()
{
Console.WriteLine("Pointer address is {0:X2}.", (long) ((ulong) this.pointer));
}
}
在MSIL代码中我们可以看到foreach
被编译成下面的for循环:
shallowCloneOfArray = arrayOfPointers;
for (int num = 0; num < arrayOfPointers.Length; num++)
{
<>c__DisplayClass0_0 temp = new <>c__DisplayClass0_0();
temp.pointer = shallowCloneOfArray[num];
list.Add(temp.<Main>b__0); //Adds the action to the list of actions
}
什么意思指针的值实际上是在迭代循环并创建委托时复制的,所以指针 的值 是将被打印的值(a.k.a:每个动作都来自它自己的 <>c__DisplayClass0_0
实例,并将接收其临时克隆指针)。
正如我们刚刚看到的,foreach
之前的 "reused variable"
是数组本身,这意味着引用的指针不会被重用,这意味着如果规范如您所说,比他们错了,因为你附加的规范建议输出应该是 00 00 00 00
。结果:
List complete
Pointer address is 00.
Pointer address is 02.
Pointer address is 04.
Pointer address is C8.
在 C# 5 中,foreach
语句的闭包语义(当匿名函数的迭代变量为 "captured" 或 "closed over" 时)为 famously changed (link to thread on that topic)。
问题:是否也打算为指针类型的数组更改此设置?
我问的原因是foreach
语句的"expansion"由于技术原因必须重写(我们不能使用Current
属性 System.Collections.IEnumerator
因为这个 属性 声明了与指针类型不兼容的类型 object
)与其他集合的 foreach
相比。 C# 语言规范中的相关部分 "Pointer arrays",在 version 5.0 中说:
foreach (V v in x) EMBEDDED-STATEMENT
扩展为:
{
T[,,…,] a = x;
V v;
for (int i0 = a.GetLowerBound(0); i0 <= a.GetUpperBound(0); i0++)
for (int i1 = a.GetLowerBound(1); i1 <= a.GetUpperBound(1); i1++)
…
for (int in = a.GetLowerBound(N); iN <= a.GetUpperBound(n); iN++) {
v = (V)a.GetValue(i0,i1,…,iN);
EMBEDDED-STATEMENT
}
}
我们注意到声明 V v;
位于所有 for
循环之外。所以看起来闭包语义仍然是 "old" C# 4 风格,"loop variable is reused, loop variable is "outer" 相对于循环"。
为了弄清楚我在说什么,请考虑这个完整的 C# 5 程序:
using System;
using System.Collections.Generic;
static class Program
{
unsafe static void Main()
{
char* zeroCharPointer = null;
char*[] arrayOfPointers =
{ zeroCharPointer, zeroCharPointer + 1, zeroCharPointer + 2, zeroCharPointer + 100, };
var list = new List<Action>();
// foreach through pointer array, capture each foreach variable 'pointer' in a lambda
foreach (var pointer in arrayOfPointers)
list.Add(() => Console.WriteLine("Pointer address is {0:X2}.", (long)pointer));
Console.WriteLine("List complete");
// invoke those delegates
foreach (var act in list)
act();
}
// Possible output:
//
// List complete
// Pointer address is 00.
// Pointer address is 02.
// Pointer address is 04.
// Pointer address is C8.
//
// Or:
//
// List complete
// Pointer address is C8.
// Pointer address is C8.
// Pointer address is C8.
// Pointer address is C8.
}
那么上面程序的正确输出是什么?
我想规范在这部分(关于指针数组)没有更新以反映 V 变量也进入内部范围。如果使用 C# 5 编译器编译您的示例并查看输出 - 它看起来像规范(使用数组访问而不是 GetValue,因为您在注释中正确指出),除了 V 变量将在所有 for 循环内。输出将是 00-02-04-C8,但当然你自己知道所有这些:)
长话短说 - 当然我不知道这是否是故意的,但我的猜测是它打算将变量移动到所有 foreach 循环的内部范围,包括指针数组,并且规范不是已更新以反映这一点。
我已经联系了 C# 语言 PM Mads Torgersen,他们似乎只是忘记更新规范的这一部分。 His exact answer 是(我问为什么规范没有更新):
because I forgot! :-) I now have in latest draft, and submitted to ECMA. Thanks!
看来 C#-5 的行为对于指针数组也是相同的,这就是为什么您会看到第一个输出,这是正确的。
以下代码编译 (C# 5.0) 为给定的 IL 代码 (代码中的注释):
.method private hidebysig static void Main() cil managed
{
.entrypoint
.maxstack 6
.locals init (
[0] char* chPtr,
[1] char*[] chPtrArray,
[2] class [mscorlib]System.Collections.Generic.List`1<class [mscorlib]System.Action> list,
[3] char*[] chPtrArray2,
[4] int32 num,
[5] class ConsoleTests.Program/<>c__DisplayClass0_0 class_,
[6] valuetype [mscorlib]System.Collections.Generic.List`1/Enumerator`0<class [mscorlib]System.Action> enumerator,
[7] class [mscorlib]System.Action action)
L_0000: nop
L_0001: ldc.i4.0 //{{{{{
L_0002: conv.u //chPtr = null;
L_0003: stloc.0 //}}}}}
L_0004: ldc.i4.4 //{{{{{
L_0005: newarr char* //Creates a new char*[4]}}}}}
L_000a: dup //{{{{{
L_000b: ldc.i4.0 // Sets the first element in the new
L_000c: ldloc.0 // char*[] to chPtr.
L_000d: stelem.i //}}}}}
L_000e: dup //{{{{{
L_000f: ldc.i4.1 //
L_0010: ldloc.0 // Sets the second element of the
L_0011: ldc.i4.2 // char*[] to chPtr + 1
L_0012: add // (loads 2 instead of 1 because char is UTF-16)
L_0013: stelem.i //}}}}}
L_0014: dup //{{{{{
L_0015: ldc.i4.2 //
L_0016: ldloc.0 //
L_0017: ldc.i4.2 // Sets the third element of the
L_0018: conv.i // char*[] to chPtr + 2
L_0019: ldc.i4.2 // (loads 4 instead of 2 because char is UTF-16)
L_001a: mul //
L_001b: add //
L_001c: stelem.i //}}}}}
L_001d: dup //{{{{{
L_001e: ldc.i4.3 //
L_001f: ldloc.0 //
L_0020: ldc.i4.s 100 // Sets the third element of the
L_0022: conv.i // char*[] to chPtr + 100
L_0023: ldc.i4.2 // (loads 200 instead of 100 because char is UTF-16)
L_0024: mul //
L_0025: add //
L_0026: stelem.i // }}}}}
L_0027: stloc.1 // chPtrArray = the new array that we have just filled.
L_0028: newobj instance void [mscorlib]System.Collections.Generic.List`1<class [mscorlib]System.Action>::.ctor() //{{{{{
L_002d: stloc.2 // list = new List<Action>()
L_002e: nop //}}}}}
L_002f: ldloc.1 //{{{{{
L_0030: stloc.3 //chPtrArray2 = chPtrArray}}}}}
L_0031: ldc.i4.0 //for (int num = 0; num < 3; num++)
L_0032: stloc.s num //
L_0034: br.s L_0062 //<<<<< (for start)
L_0036: newobj instance void ConsoleTests.Program/<>c__DisplayClass0_0::.ctor() //{{{{{
L_003b: stloc.s class_ //class_ = new temporary compile-time class
L_003d: ldloc.s class_ //}}}}}
L_003f: ldloc.3 //{{{{{
L_0040: ldloc.s num //
L_0042: ldelem.i //
L_0043: stfld char* ConsoleTests.Program/<>c__DisplayClass0_0::pointer //class_.pointer = chPtrArray2[num]}}}}}
L_0048: ldloc.2 //{{{{{
L_0049: ldloc.s class_ //
L_004b: ldftn instance void ConsoleTests.Program/<>c__DisplayClass0_0::<Main>b__0() // list.Add(class_.<Main>b__0);
L_0051: newobj instance void [mscorlib]System.Action::.ctor(object, native int) // (Adds the temporary compile-time class action, which has the correct pointer since
L_0056: callvirt instance void [mscorlib]System.Collections.Generic.List`1<class [mscorlib]System.Action>::Add(!0) //it is a specific class instace for this iteration, to the list)}}}}}
L_005b: nop
L_005c: ldloc.s num //practically the end of the for
L_005e: ldc.i4.1 // (actually increasing num and comparing)
L_005f: add //
L_0060: stloc.s num //
L_0062: ldloc.s num //
L_0064: ldloc.3 //
L_0065: ldlen //
L_0066: conv.i4 //
L_0067: blt.s L_0036 //>>>>> (for complete)
L_0069: ldstr "List complete" //Printing and stuff.....
L_006e: call void [mscorlib]System.Console::WriteLine(string)
L_0073: nop
L_0074: nop
L_0075: ldloc.2
L_0076: callvirt instance valuetype [mscorlib]System.Collections.Generic.List`1/Enumerator`0<!0> [mscorlib]System.Collections.Generic.List`1<class [mscorlib]System.Action>::GetEnumerator()
L_007b: stloc.s enumerator
L_007d: br.s L_0090
L_007f: ldloca.s enumerator
L_0081: call instance !0 [mscorlib]System.Collections.Generic.List`1/Enumerator`0<class [mscorlib]System.Action>::get_Current()
L_0086: stloc.s action
L_0088: ldloc.s action
L_008a: callvirt instance void [mscorlib]System.Action::Invoke()
L_008f: nop
L_0090: ldloca.s enumerator
L_0092: call instance bool [mscorlib]System.Collections.Generic.List`1/Enumerator`0<class [mscorlib]System.Action>::MoveNext()
L_0097: brtrue.s L_007f
L_0099: leave.s L_00aa
L_009b: ldloca.s enumerator
L_009d: constrained. [mscorlib]System.Collections.Generic.List`1/Enumerator`0<class [mscorlib]System.Action>
L_00a3: callvirt instance void [mscorlib]System.IDisposable::Dispose()
L_00a8: nop
L_00a9: endfinally
L_00aa: ret
.try L_007d to L_009b finally handler L_009b to L_00aa
}
如您所见,a class 是在编译时生成的 ,称为 <>c__DisplayClass0_0
,其中包含您的 Action
和一个值char*
个。 class 看起来像这样:
[CompilerGenerated]
private sealed class <>c__DisplayClass0_0
{
// Fields
public unsafe char* pointer;
// Methods
internal unsafe void <Main>b__0()
{
Console.WriteLine("Pointer address is {0:X2}.", (long) ((ulong) this.pointer));
}
}
在MSIL代码中我们可以看到foreach
被编译成下面的for循环:
shallowCloneOfArray = arrayOfPointers;
for (int num = 0; num < arrayOfPointers.Length; num++)
{
<>c__DisplayClass0_0 temp = new <>c__DisplayClass0_0();
temp.pointer = shallowCloneOfArray[num];
list.Add(temp.<Main>b__0); //Adds the action to the list of actions
}
什么意思指针的值实际上是在迭代循环并创建委托时复制的,所以指针 的值 是将被打印的值(a.k.a:每个动作都来自它自己的 <>c__DisplayClass0_0
实例,并将接收其临时克隆指针)。
正如我们刚刚看到的,foreach
之前的 "reused variable"
是数组本身,这意味着引用的指针不会被重用,这意味着如果规范如您所说,比他们错了,因为你附加的规范建议输出应该是 00 00 00 00
。结果:
List complete
Pointer address is 00.
Pointer address is 02.
Pointer address is 04.
Pointer address is C8.