Can/does (forward) pipe operator 防止尾调用优化?
Can/does the (forward) pipe operator prevent tail call optimization?
对于工作中的参数优化问题,我编写了一个遗传算法来找到一些好的设置,因为蛮力解决方案是不可行的。不幸的是,当我在早上 return 时,大多数时候我看到的是 WhosebugException
.
我使用 F# 已经有一段时间了,所以我知道 TCO 以及对带有累加器参数的函数的需求,并且通常使用这种形式。
经过大量搜索,我认为我能够确定触发异常的代码:
breedPopulation alive |> simulate (generation + 1) lastTime ewma
breedPopulation
从当前 alive
中的个体生成新一代。然后下一个 round/generation 开始调用 simulate
。当我查看反汇编(完全菜鸟)时,我发现了一些 pop
和一个 ret
,所以它看起来不像是对我的常规(非尾)调用。
mov rcx,qword ptr [rbp+10h]
mov rcx,qword ptr [rcx+8]
mov rdx,qword ptr [rbp-40h]
cmp dword ptr [rcx],ecx
call 00007FFA3E4905C0
mov qword ptr [rbp-0F0h],rax
mov r8,qword ptr [rbp-0F0h]
mov qword ptr [rbp-80h],r8
mov r8,qword ptr [rbp-78h]
mov qword ptr [rsp+20h],r8
mov r8d,dword ptr [rbp+18h]
inc r8d
mov rdx,qword ptr [rbp+10h]
mov r9,qword ptr [rbp-20h]
mov rcx,7FFA3E525960h
call 00007FFA3E4A5040
mov qword ptr [rbp-0F8h],rax
mov rcx,qword ptr [rbp-0F8h]
mov rdx,qword ptr [rbp-80h]
mov rax,qword ptr [rbp-0F8h]
mov rax,qword ptr [rax]
mov rax,qword ptr [rax+40h]
call qword ptr [rax+20h]
mov qword ptr [rbp-100h],rax
mov rax,qword ptr [rbp-100h]
lea rsp,[rbp-10h]
pop rsi
pop rdi
pop rbp
ret
扔掉pipe operator,把breeding放在正常参数位置后,拆解就不一样了
// simulate (generation + 1) lastTime ewma (breedPopulation alive)
mov ecx,dword ptr [rbp+18h]
inc ecx
mov dword ptr [rbp-30h],ecx
mov rcx,qword ptr [rbp-20h]
mov qword ptr [rbp-38h],rcx
mov rcx,qword ptr [rbp-80h]
mov qword ptr [rbp-0F0h],rcx
mov rcx,qword ptr [rbp+10h]
mov rcx,qword ptr [rcx+8]
mov rdx,qword ptr [rbp-48h]
cmp dword ptr [rcx],ecx
call 00007FFA3E4605C0
mov qword ptr [rbp-0F8h],rax
mov rax,qword ptr [rbp-0F8h]
mov qword ptr [rbp+30h],rax
mov rax,qword ptr [rbp-0F0h]
mov qword ptr [rbp+28h],rax
mov rax,qword ptr [rbp-38h]
mov qword ptr [rbp+20h],rax
mov eax,dword ptr [rbp-30h]
mov dword ptr [rbp+18h],eax
nop
jmp 00007FFA3E47585B
这绝对更短,最后 jmp
甚至比尾调用更好。
因此我想了解 如果以及为什么 |>
似乎是问题所在,以及何时会产生影响——毕竟,这是第一次多年后它咬我。在什么情况下发生,需要注意什么?
更新: Guy pointed out that my listings are not IL but assembly, I first reworded the question. This is what I found out with ILSpy之后:
使用 |> 运算符
看反编译的C#,代码似乎在
之间来回跳转
internal static FSharpFunc<Types.Genome[], System.Tuple<System.Tuple<float, float>, LbpArea[]>[]> simulate@265-1(Universe x, System.Threading.ManualResetEvent pleaseStop, int generation, System.DateTime lastTime, FSharpOption<double> ewma)
{
return new $Universe.simulate@267-2(x, pleaseStop, generation, lastTime, ewma);
}
和
// internal class simulate@267-2
public override System.Tuple<System.Tuple<float, float>, LbpArea[]>[] Invoke(Types.Genome[] population)
{
LbpArea[][] array = ArrayModule.Parallel.Map<Types.Genome, LbpArea[]>(this.x.genomeToArray, population);
FSharpFunc<System.Tuple<System.Tuple<float, float>, LbpArea[]>, float> accessFitness = this.x.accessFitness;
System.Tuple<System.Tuple<float, float>, LbpArea[]>[] array2 = ArrayModule.Filter<System.Tuple<System.Tuple<float, float>, LbpArea[]>>(new $Universe.alive@274(accessFitness), ArrayModule.Parallel.Map<LbpArea[], System.Tuple<System.Tuple<float, float>, LbpArea[]>>(new $Universe.alive@273-1(this.x), array));
if (array2 == null)
{
throw new System.ArgumentNullException("array");
}
System.Tuple<System.Tuple<float, float>, LbpArea[]>[] array3 = ArrayModule.SortWith<System.Tuple<System.Tuple<float, float>, LbpArea[]>>(new $Universe.alive@275-2(), array2);
this.x.Population = array3;
System.Tuple<System.DateTime, FSharpOption<double>> tuple = this.x.printProgress<float, LbpArea[]>(this.lastTime, this.ewma, this.generation, array3);
System.DateTime item = tuple.Item1;
FSharpOption<double> item2 = tuple.Item2;
if (this.pleaseStop.WaitOne(0))
{
return array3;
}
Types.Genome[] func = this.x.breedPopulation(array3);
return $Universe.simulate@265-1(this.x, this.pleaseStop, this.generation + 1, item, item2).Invoke(func);
}
在 new
调用的 IL 中找不到 tail.
操作。另一方面,Invoke
的最后几行的 IL 读取
IL_00d3: call class [FSharp.Core]Microsoft.FSharp.Core.FSharpFunc`2<class BioID.GeneticLbp.Types/Genome[], class [mscorlib]System.Tuple`2<class [mscorlib]System.Tuple`2<float32, float32>, valuetype [BioID.Operations.Biometrics]BioID.Operations.Biometrics.LbpArea[]>[]> '<StartupCode$BioID-GeneticLbp>.$Universe'::'simulate@265-1'(class BioID.GeneticLbp.Universe, class [mscorlib]System.Threading.ManualResetEvent, int32, valuetype [mscorlib]System.DateTime, class [FSharp.Core]Microsoft.FSharp.Core.FSharpOption`1<float64>)
IL_00d8: ldloc.s 7
IL_00da: tail.
IL_00dc: callvirt instance !1 class [FSharp.Core]Microsoft.FSharp.Core.FSharpFunc`2<class BioID.GeneticLbp.Types/Genome[], class [mscorlib]System.Tuple`2<class [mscorlib]System.Tuple`2<float32, float32>, valuetype [BioID.Operations.Biometrics]BioID.Operations.Biometrics.LbpArea[]>[]>::Invoke(!0)
IL_00e1: ret
我不知道该怎么做。
没有 |> 运算符
另一个版本确实很不一样。从
开始
internal static System.Tuple<System.Tuple<float, float>, LbpArea[]>[] simulate@264(Universe x, System.Threading.ManualResetEvent pleaseStop, Unit unitVar0)
{
FSharpFunc<int, FSharpFunc<System.DateTime, FSharpFunc<FSharpOption<double>, FSharpFunc<Types.Genome[], System.Tuple<System.Tuple<float, float>, LbpArea[]>[]>>>> fSharpFunc = new $Universe.simulate@265-2(x, pleaseStop);
(($Universe.simulate@265-2)fSharpFunc).x = x;
(($Universe.simulate@265-2)fSharpFunc).pleaseStop = pleaseStop;
System.Tuple<System.Tuple<float, float>, LbpArea[]>[] population = x.Population;
Types.Genome[] func;
if (population != null && population.Length == 0)
{
func = x.lengthRandomlyIncreasing(x.laws@53.PopulationSize@);
return FSharpFunc<int, System.DateTime>.InvokeFast<FSharpOption<double>, FSharpFunc<Types.Genome[], System.Tuple<System.Tuple<float, float>, LbpArea[]>[]>>(fSharpFunc, 0, System.DateTime.Now, null).Invoke(func);
}
FSharpFunc<LbpArea[], Types.Genome> arrayToGenome = x.arrayToGenome;
func = ArrayModule.Parallel.Map<System.Tuple<System.Tuple<float, float>, LbpArea[]>, Types.Genome>(new $Universe.simulate@296-3(arrayToGenome), population);
return FSharpFunc<int, System.DateTime>.InvokeFast<FSharpOption<double>, FSharpFunc<Types.Genome[], System.Tuple<System.Tuple<float, float>, LbpArea[]>[]>>(fSharpFunc, 0, System.DateTime.Now, null).Invoke(func);
}
它转到
// internal class simulate@265-2
public override System.Tuple<System.Tuple<float, float>, LbpArea[]>[] Invoke(int generation, System.DateTime lastTime, FSharpOption<double> ewma, Types.Genome[] population)
{
return $Universe.simulate@265-1(this.x, this.pleaseStop, generation, lastTime, ewma, population);
}
最后
internal static System.Tuple<System.Tuple<float, float>, LbpArea[]>[] simulate@265-1(Universe x, System.Threading.ManualResetEvent pleaseStop, int generation, System.DateTime lastTime, FSharpOption<double> ewma, Types.Genome[] population)
{
while (true)
{
// Playing evolution...
if (pleaseStop.WaitOne(0))
{
return array3;
}
// Setting up parameters for next loop...
}
throw new System.ArgumentNullException("array");
}
tl;博士
毫无疑问,管道运算符的使用彻底改变了程序流程。我的猜测是这两个函数之间的来回是最终导致异常的原因。
我已经读过 Tail Calls in F# 但我认为它不适用于这种情况,因为我没有使用 first-class 函数 returning 单位作为值(在我的 F# 代码中)。
所以问题仍然存在:为什么管道运算符会在这里产生这种破坏性影响?我怎么知道 beforehand/what 我需要注意什么?
更新二:
您可以在 GitHub 找到示例的简化版本。请亲自查看 inline
运算符 |>
更改了生成的 IL,这不是我所期望的。
在减少示例的同时,幸运的是我能够找到异常的真正来源。您可以检查 branch 以获得更小的变体。毕竟,它与管道没有任何关系,但我仍然不明白,因为恕我直言 是 尾递归。
但我最初的问题仍然存在。我只是添加了一个 更多 。 :)
基于所提供的最小情况,如果代码在 64 位的发布模式下为 运行,则它会因堆栈溢出而失败。如果代码在 32 位模式的发布模式下为 运行,则成功。
注意:选择 32 位和 64 位的选项是 Prefer 32-bit
,如下图所示。
增加堆栈大小将导致代码在 64 位版本中成功进入发布模式。这是通过使用 Thread constructor.
完成的
[<EntryPoint>]
let main _ =
let test () =
let r = KissRandom()
let n = r.Normal()
Seq.item 20000 n |> printfn "%f"
/// The greatest maximum-stack-size that should be used
/// with the 'runWithStackFrame' function.
let STACK_LIMIT = 16777216
/// Run a function with a custom maximum stack size.
/// This is necessary for some functions to execute
/// without raising a WhosebugException.
let runWithCustomStackSize maxStackSize fn =
// Preconditions
if maxStackSize < 1048576 then
invalidArg "stackSize" "Functions should not be executed with a \
maximum stack size of less than 1048576 bytes (1MB)."
elif maxStackSize > STACK_LIMIT then
invalidArg "stackSize" "The maximum size of the stack frame should \
not exceed 16777216 bytes (16MB)."
/// Holds the return value of the function.
let result = ref Unchecked.defaultof<'T>
// Create a thread with the specified maximum stack size,
// then immediately execute the function on it.
let thread = System.Threading.Thread ((fun () -> result := fn()), maxStackSize)
thread.Start ()
// Wait for the function/thread to finish and return the result.
thread.Join ()
!result
/// Runs a function within a thread which has an enlarged maximum-stack-size.
let inline runWithEnlargedStack fn =
runWithCustomStackSize STACK_LIMIT fn
// test () // Fails with stack overflow in 64-bit mode, Release
// Runs successfully in 32-bit mode, Release
runWithEnlargedStack test
printf "Press any key to exit: "
System.Console.ReadKey() |> ignore
printfn ""
0
此代码来自 FSharp-logic-examples,特别是 Anh-Dung Phan
虽然我没有检查根本原因,但我怀疑这是因为 64 位项目的大小大于 32 位项目的大小,即使放置的项目数量进入堆栈并且两个版本的堆栈大小保持相同,项目大小的增加将堆栈所需的内存推到超过 1 兆字节的限制。
TL;DR
这是一个有趣且富有启发性的问题。很高兴被问到。
最初问题似乎与 |>
和 TCO 的使用有关,因为它仍然有价值,所以我将其留在答案中。我还要感谢 OP 的回应和帮助,很高兴能帮助与你共事而不是反对你的人。
在下面的递归代码中,|>
在 Visual Studio 的调试模式下是 运行,它会导致 Whosebug。
如果它是从 bin\release
目录的命令行启动的,它不会导致 Whosebug。
使用 Visual Studio 15 个社区
[<EntryPoint>]
let main argv =
let largeList =
printfn "Creating large list"
[
for i in 1 .. 100000000 do
yield i
]
// causes Whosebug in Debug
// No Whosebug in Release
let sum4 l =
printfn "testing sum4"
let rec sumInner4 l acc =
match l with
| h::t ->
let acc = acc + h
acc |> sumInner4 t
| [] -> acc
sumInner4 l 0
let result4 = sum4 largeList
printfn "result4: %A" result4
在 Visual Studio 工具栏中设置 Release 或 Debug 的地方
调试模式下项目的选项是
发布模式下项目的选项是
tldr;
在测试这个过程中,我创建了 16 个不同的测试,并在调试和发布模式下构建它们,并验证它们是否 运行 完成或抛出堆栈溢出。这 16 个被分解为一组 4 个,每个 4 个案例。情况 1、5、9、13 是否定的并产生堆栈溢出以确保可以创建堆栈溢出。案例 2、6、10、14 是肯定的,表明尾调用正在工作并且没有导致堆栈溢出。案例 3、7、11、15 显示了一个尾部调用,其操作在与尾部调用相同的语句中完成,并且与使用 |>
的测试用例相差一个因式分解;这些按预期工作。案例 4、8、12、16 使用 |>
并显示它何时在调试模式下工作和不工作,这可能会让很多人感到惊讶。案例 1-4 和 9-12 使用 f x y
形式的函数,案例 8-11 使用 f x
形式的函数,案例 12-16 使用 f x y z
形式的函数].我最初做了前 8 个测试用例,但在 Keith 的评论之后又做了 4 个不使用列表但仍然使用 from f x y
的函数并呈现意外结果然后又做了 4 个使用函数的函数形式 f x y z
.
要进行 运行 测试,您必须注释掉除计划 运行 之外的所有测试,并在调试模式下构建一次,然后可以 运行从 Visual Studio 中,然后再次在发布模式下构建它并 运行 它。我从命令行 运行 确保我 运行 发布版本。
[<EntryPoint>]
let main argv =
let largeList =
printfn "Creating large list"
[
for i in 1 .. 100000000 do
yield i
]
// causes Whosebug in Debug
// causes Whosebug in Release
// Negative confirmation
// A supposed tail call that DOES cause a stack overflow in both debug and release mode
// options: f x y
let sum1 l =
printfn "test 01: "
let rec sum1Inner l acc =
match l with
| h::t ->
let acc = acc + h
1 + sum1Inner t acc
| [] -> acc
sum1Inner l 0
// No Whosebug in Debug
// No Whosebug in Release
// Positive confirmation
// A tail call that DOES NOT cause a stack overflow in both debug and release mode
// options: f x y
let sum2 l =
printfn "test 02: "
let rec sum2Inner l acc =
match l with
| h::t ->
let acc = acc + h
sum2Inner t acc
| [] -> acc
sum2Inner l 0
// No Whosebug in Debug
// No Whosebug in Release
// A test case
// options: f x y and no |>
let sum3 l =
printfn "test 03: "
let rec sum3Inner l acc =
match l with
| h::t ->
sum3Inner t (acc + h)
| [] -> acc
sum3Inner l 0
// causes Whosebug in Debug
// No Whosebug in Release
// A test case
// options: f x y and |>
let sum4 l =
printfn "test 04: "
let rec sum4Inner l acc =
match l with
| h::t ->
let acc = acc + h
acc |> sum4Inner t
| [] -> acc
sum4Inner l 0
// causes Whosebug in Debug
// causes Whosebug in Release
// Negative confirmation
// A supposed tail call that DOES cause a stack overflow in both debug and release mode
// options: f x
let sum5 () =
printfn "test 05: "
let rec sum5Inner x =
match x with
| 10000000 -> x
| _ ->
let acc = x + 1
1 + sum5Inner acc
sum5Inner 0
// No Whosebug in Debug
// No Whosebug in Release
// Positive confirmation
// A tail call that DOES NOT cause a stack overflow in both debug and release mode
// options: f x
let sum6 () =
printfn "test 06: "
let rec sum6Inner x =
match x with
| 10000000 -> x
| _ ->
let acc = x + 1
sum6Inner acc
sum6Inner 0
// No Whosebug in Debug
// No Whosebug in Release
// A test case
// options: f x and no |>
let sum7 l =
printfn "test 07: "
let rec sum7Inner x =
match x with
| 10000000 -> x
| _ -> sum7Inner (x + 1)
sum7Inner 0
// No Whosebug in Debug
// No Whosebug in Release
// A test case
// options: f x and |>
let sum8 () =
printfn "test 07: "
let rec sumInner8 x =
match x with
| 10000000 -> x
| _ ->
let acc = x + 1
acc |> sumInner8
sumInner8 0
// causes Whosebug in Debug
// causes Whosebug in Release
// Negative confirmation"
// A supposed tail call that DOES cause a stack overflow in both debug and release mode"
// options: f x y"
let sum9 () =
printfn "test 09: "
let rec sum9Inner x y =
match y with
| 10000000 -> y
| _ ->
let acc = x + y
1 + sum9Inner x acc
sum9Inner 1 0
// No Whosebug in Debug
// No Whosebug in Release
// Positive confirmation
// A tail call that DOES NOT cause a stack overflow in both debug and release mode
// options: f x y
let sum10 () =
printfn "test 10: "
let rec sum10Inner x y =
match y with
| 10000000 -> y
| _ ->
let acc = x + y
sum10Inner x acc
sum10Inner 1 0
// No Whosebug in Debug
// No Whosebug in Release
// A test case
// options: f x y and no |>
let sum11 () =
printfn "test 11: "
let rec sum11Inner x y =
match y with
| 10000000 -> y
| _ ->
sum11Inner x (x + y)
sum11Inner 1 0
// causes Whosebug in Debug
// No Whosebug in Release
// A test case
// options: f x y and |>
let sum12 () =
printfn "test 12: "
let rec sum12Inner x y =
match y with
| 10000000 -> y
| _ ->
let acc = x + y
acc |> sum12Inner x
sum12Inner 1 0
// causes Whosebug in Debug
// No Whosebug in Release
// A test case"
// options: f x y and |>"
let sum12 () =
printfn "test 12: "
let rec sum12Inner x y =
match y with
| 10000000 -> y
| _ ->
let acc = x + y
acc |> sum12Inner x
sum12Inner 1 0
// causes Whosebug in Debug
// causes Whosebug in Release
// Negative confirmation"
// A supposed tail call that DOES cause a stack overflow in both debug and release mode"
// options: f x y"
let sum13 () =
printfn "test 13: "
let rec sum13Inner x z y =
match y with
| 10000000 -> y
| _ ->
let acc = x + y
1 + sum13Inner x z acc
sum13Inner 1 "z" 0
// No Whosebug in Debug
// No Whosebug in Release
// Positive confirmation"
// A tail call that DOES NOT cause a stack overflow in both debug and release mode"
// options: f x y"
let sum14 () =
printfn "test 14: "
let rec sum14Inner x z y =
match y with
| 10000000 -> y
| _ ->
let acc = x + y
sum14Inner x z acc
sum14Inner 1 "z" 0
// No Whosebug in Debug
// No Whosebug in Release
// A test case"
// options: f x y and no |>"
let sum15 () =
printfn "test 15: "
let rec sum15Inner x z y =
match y with
| 10000000 -> y
| _ ->
sum15Inner x z (x + y)
sum15Inner 1 "z" 0
// causes Whosebug in Debug
// No Whosebug in Release
// A test case"
// options: f x y and |>"
let sum16 () =
printfn "test 16: "
let rec sum16Inner x z y =
match y with
| 10000000 -> y
| _ ->
let acc = x + y
acc |> sum16Inner x z
sum16Inner 1 "z" 0
let result1 = sum1 largeList
printfn "result1: %A" result1
let result2 = sum2 largeList
printfn "result2: %A" result2
let result3 = sum3 largeList
printfn "result3: %A" result3
let result4 = sum4 largeList
printfn "result4: %A" result4
let result5 = sum5 ()
printfn "result5: %A" result5
let result6 = sum6 ()
printfn "result6: %A" result6
let result7 = sum7 ()
printfn "result7: %A" result7
let result8 = sum8 ()
printfn "result8: %A" result8
let result9 = sum9 ()
printfn "result9: %A" result9
let result10 = sum10 ()
printfn "result10: %A" result10
let result11 = sum11 ()
printfn "result11: %A" result11
let result12 = sum12 ()
printfn "result12: %A" result12
let result13 = sum13 ()
printfn "result13: %A" result13
let result14 = sum14 ()
printfn "result14: %A" result14
let result15 = sum15 ()
printfn "result15: %A" result15
let result16 = sum16 ()
printfn "result16: %A" result16
printf "Press any key to exit: "
System.Console.ReadKey() |> ignore
printfn ""
0 // return an integer exit code
额外的新信息
编辑:This thread on Github has Don Syme, creator of F#, specifically mention that:
[...] Second, you are correct, we don't guarantee to optimize uses of f <| x
or x |> f
or any similar to first-calling tailcalls even if f x
is a tailcall.
对于工作中的参数优化问题,我编写了一个遗传算法来找到一些好的设置,因为蛮力解决方案是不可行的。不幸的是,当我在早上 return 时,大多数时候我看到的是 WhosebugException
.
我使用 F# 已经有一段时间了,所以我知道 TCO 以及对带有累加器参数的函数的需求,并且通常使用这种形式。
经过大量搜索,我认为我能够确定触发异常的代码:
breedPopulation alive |> simulate (generation + 1) lastTime ewma
breedPopulation
从当前 alive
中的个体生成新一代。然后下一个 round/generation 开始调用 simulate
。当我查看反汇编(完全菜鸟)时,我发现了一些 pop
和一个 ret
,所以它看起来不像是对我的常规(非尾)调用。
mov rcx,qword ptr [rbp+10h]
mov rcx,qword ptr [rcx+8]
mov rdx,qword ptr [rbp-40h]
cmp dword ptr [rcx],ecx
call 00007FFA3E4905C0
mov qword ptr [rbp-0F0h],rax
mov r8,qword ptr [rbp-0F0h]
mov qword ptr [rbp-80h],r8
mov r8,qword ptr [rbp-78h]
mov qword ptr [rsp+20h],r8
mov r8d,dword ptr [rbp+18h]
inc r8d
mov rdx,qword ptr [rbp+10h]
mov r9,qword ptr [rbp-20h]
mov rcx,7FFA3E525960h
call 00007FFA3E4A5040
mov qword ptr [rbp-0F8h],rax
mov rcx,qword ptr [rbp-0F8h]
mov rdx,qword ptr [rbp-80h]
mov rax,qword ptr [rbp-0F8h]
mov rax,qword ptr [rax]
mov rax,qword ptr [rax+40h]
call qword ptr [rax+20h]
mov qword ptr [rbp-100h],rax
mov rax,qword ptr [rbp-100h]
lea rsp,[rbp-10h]
pop rsi
pop rdi
pop rbp
ret
扔掉pipe operator,把breeding放在正常参数位置后,拆解就不一样了
// simulate (generation + 1) lastTime ewma (breedPopulation alive)
mov ecx,dword ptr [rbp+18h]
inc ecx
mov dword ptr [rbp-30h],ecx
mov rcx,qword ptr [rbp-20h]
mov qword ptr [rbp-38h],rcx
mov rcx,qword ptr [rbp-80h]
mov qword ptr [rbp-0F0h],rcx
mov rcx,qword ptr [rbp+10h]
mov rcx,qword ptr [rcx+8]
mov rdx,qword ptr [rbp-48h]
cmp dword ptr [rcx],ecx
call 00007FFA3E4605C0
mov qword ptr [rbp-0F8h],rax
mov rax,qword ptr [rbp-0F8h]
mov qword ptr [rbp+30h],rax
mov rax,qword ptr [rbp-0F0h]
mov qword ptr [rbp+28h],rax
mov rax,qword ptr [rbp-38h]
mov qword ptr [rbp+20h],rax
mov eax,dword ptr [rbp-30h]
mov dword ptr [rbp+18h],eax
nop
jmp 00007FFA3E47585B
这绝对更短,最后 jmp
甚至比尾调用更好。
因此我想了解 如果以及为什么 |>
似乎是问题所在,以及何时会产生影响——毕竟,这是第一次多年后它咬我。在什么情况下发生,需要注意什么?
更新: Guy pointed out that my listings are not IL but assembly, I first reworded the question. This is what I found out with ILSpy之后:
使用 |> 运算符
看反编译的C#,代码似乎在
之间来回跳转internal static FSharpFunc<Types.Genome[], System.Tuple<System.Tuple<float, float>, LbpArea[]>[]> simulate@265-1(Universe x, System.Threading.ManualResetEvent pleaseStop, int generation, System.DateTime lastTime, FSharpOption<double> ewma)
{
return new $Universe.simulate@267-2(x, pleaseStop, generation, lastTime, ewma);
}
和
// internal class simulate@267-2
public override System.Tuple<System.Tuple<float, float>, LbpArea[]>[] Invoke(Types.Genome[] population)
{
LbpArea[][] array = ArrayModule.Parallel.Map<Types.Genome, LbpArea[]>(this.x.genomeToArray, population);
FSharpFunc<System.Tuple<System.Tuple<float, float>, LbpArea[]>, float> accessFitness = this.x.accessFitness;
System.Tuple<System.Tuple<float, float>, LbpArea[]>[] array2 = ArrayModule.Filter<System.Tuple<System.Tuple<float, float>, LbpArea[]>>(new $Universe.alive@274(accessFitness), ArrayModule.Parallel.Map<LbpArea[], System.Tuple<System.Tuple<float, float>, LbpArea[]>>(new $Universe.alive@273-1(this.x), array));
if (array2 == null)
{
throw new System.ArgumentNullException("array");
}
System.Tuple<System.Tuple<float, float>, LbpArea[]>[] array3 = ArrayModule.SortWith<System.Tuple<System.Tuple<float, float>, LbpArea[]>>(new $Universe.alive@275-2(), array2);
this.x.Population = array3;
System.Tuple<System.DateTime, FSharpOption<double>> tuple = this.x.printProgress<float, LbpArea[]>(this.lastTime, this.ewma, this.generation, array3);
System.DateTime item = tuple.Item1;
FSharpOption<double> item2 = tuple.Item2;
if (this.pleaseStop.WaitOne(0))
{
return array3;
}
Types.Genome[] func = this.x.breedPopulation(array3);
return $Universe.simulate@265-1(this.x, this.pleaseStop, this.generation + 1, item, item2).Invoke(func);
}
在 new
调用的 IL 中找不到 tail.
操作。另一方面,Invoke
的最后几行的 IL 读取
IL_00d3: call class [FSharp.Core]Microsoft.FSharp.Core.FSharpFunc`2<class BioID.GeneticLbp.Types/Genome[], class [mscorlib]System.Tuple`2<class [mscorlib]System.Tuple`2<float32, float32>, valuetype [BioID.Operations.Biometrics]BioID.Operations.Biometrics.LbpArea[]>[]> '<StartupCode$BioID-GeneticLbp>.$Universe'::'simulate@265-1'(class BioID.GeneticLbp.Universe, class [mscorlib]System.Threading.ManualResetEvent, int32, valuetype [mscorlib]System.DateTime, class [FSharp.Core]Microsoft.FSharp.Core.FSharpOption`1<float64>)
IL_00d8: ldloc.s 7
IL_00da: tail.
IL_00dc: callvirt instance !1 class [FSharp.Core]Microsoft.FSharp.Core.FSharpFunc`2<class BioID.GeneticLbp.Types/Genome[], class [mscorlib]System.Tuple`2<class [mscorlib]System.Tuple`2<float32, float32>, valuetype [BioID.Operations.Biometrics]BioID.Operations.Biometrics.LbpArea[]>[]>::Invoke(!0)
IL_00e1: ret
我不知道该怎么做。
没有 |> 运算符
另一个版本确实很不一样。从
开始internal static System.Tuple<System.Tuple<float, float>, LbpArea[]>[] simulate@264(Universe x, System.Threading.ManualResetEvent pleaseStop, Unit unitVar0)
{
FSharpFunc<int, FSharpFunc<System.DateTime, FSharpFunc<FSharpOption<double>, FSharpFunc<Types.Genome[], System.Tuple<System.Tuple<float, float>, LbpArea[]>[]>>>> fSharpFunc = new $Universe.simulate@265-2(x, pleaseStop);
(($Universe.simulate@265-2)fSharpFunc).x = x;
(($Universe.simulate@265-2)fSharpFunc).pleaseStop = pleaseStop;
System.Tuple<System.Tuple<float, float>, LbpArea[]>[] population = x.Population;
Types.Genome[] func;
if (population != null && population.Length == 0)
{
func = x.lengthRandomlyIncreasing(x.laws@53.PopulationSize@);
return FSharpFunc<int, System.DateTime>.InvokeFast<FSharpOption<double>, FSharpFunc<Types.Genome[], System.Tuple<System.Tuple<float, float>, LbpArea[]>[]>>(fSharpFunc, 0, System.DateTime.Now, null).Invoke(func);
}
FSharpFunc<LbpArea[], Types.Genome> arrayToGenome = x.arrayToGenome;
func = ArrayModule.Parallel.Map<System.Tuple<System.Tuple<float, float>, LbpArea[]>, Types.Genome>(new $Universe.simulate@296-3(arrayToGenome), population);
return FSharpFunc<int, System.DateTime>.InvokeFast<FSharpOption<double>, FSharpFunc<Types.Genome[], System.Tuple<System.Tuple<float, float>, LbpArea[]>[]>>(fSharpFunc, 0, System.DateTime.Now, null).Invoke(func);
}
它转到
// internal class simulate@265-2
public override System.Tuple<System.Tuple<float, float>, LbpArea[]>[] Invoke(int generation, System.DateTime lastTime, FSharpOption<double> ewma, Types.Genome[] population)
{
return $Universe.simulate@265-1(this.x, this.pleaseStop, generation, lastTime, ewma, population);
}
最后
internal static System.Tuple<System.Tuple<float, float>, LbpArea[]>[] simulate@265-1(Universe x, System.Threading.ManualResetEvent pleaseStop, int generation, System.DateTime lastTime, FSharpOption<double> ewma, Types.Genome[] population)
{
while (true)
{
// Playing evolution...
if (pleaseStop.WaitOne(0))
{
return array3;
}
// Setting up parameters for next loop...
}
throw new System.ArgumentNullException("array");
}
tl;博士
毫无疑问,管道运算符的使用彻底改变了程序流程。我的猜测是这两个函数之间的来回是最终导致异常的原因。
我已经读过 Tail Calls in F# 但我认为它不适用于这种情况,因为我没有使用 first-class 函数 returning 单位作为值(在我的 F# 代码中)。
所以问题仍然存在:为什么管道运算符会在这里产生这种破坏性影响?我怎么知道 beforehand/what 我需要注意什么?
更新二:
您可以在 GitHub 找到示例的简化版本。请亲自查看 inline
运算符 |>
更改了生成的 IL,这不是我所期望的。
在减少示例的同时,幸运的是我能够找到异常的真正来源。您可以检查 branch 以获得更小的变体。毕竟,它与管道没有任何关系,但我仍然不明白,因为恕我直言 是 尾递归。
但我最初的问题仍然存在。我只是添加了一个 更多 。 :)
基于所提供的最小情况,如果代码在 64 位的发布模式下为 运行,则它会因堆栈溢出而失败。如果代码在 32 位模式的发布模式下为 运行,则成功。
注意:选择 32 位和 64 位的选项是 Prefer 32-bit
,如下图所示。
增加堆栈大小将导致代码在 64 位版本中成功进入发布模式。这是通过使用 Thread constructor.
完成的[<EntryPoint>]
let main _ =
let test () =
let r = KissRandom()
let n = r.Normal()
Seq.item 20000 n |> printfn "%f"
/// The greatest maximum-stack-size that should be used
/// with the 'runWithStackFrame' function.
let STACK_LIMIT = 16777216
/// Run a function with a custom maximum stack size.
/// This is necessary for some functions to execute
/// without raising a WhosebugException.
let runWithCustomStackSize maxStackSize fn =
// Preconditions
if maxStackSize < 1048576 then
invalidArg "stackSize" "Functions should not be executed with a \
maximum stack size of less than 1048576 bytes (1MB)."
elif maxStackSize > STACK_LIMIT then
invalidArg "stackSize" "The maximum size of the stack frame should \
not exceed 16777216 bytes (16MB)."
/// Holds the return value of the function.
let result = ref Unchecked.defaultof<'T>
// Create a thread with the specified maximum stack size,
// then immediately execute the function on it.
let thread = System.Threading.Thread ((fun () -> result := fn()), maxStackSize)
thread.Start ()
// Wait for the function/thread to finish and return the result.
thread.Join ()
!result
/// Runs a function within a thread which has an enlarged maximum-stack-size.
let inline runWithEnlargedStack fn =
runWithCustomStackSize STACK_LIMIT fn
// test () // Fails with stack overflow in 64-bit mode, Release
// Runs successfully in 32-bit mode, Release
runWithEnlargedStack test
printf "Press any key to exit: "
System.Console.ReadKey() |> ignore
printfn ""
0
此代码来自 FSharp-logic-examples,特别是 Anh-Dung Phan
虽然我没有检查根本原因,但我怀疑这是因为 64 位项目的大小大于 32 位项目的大小,即使放置的项目数量进入堆栈并且两个版本的堆栈大小保持相同,项目大小的增加将堆栈所需的内存推到超过 1 兆字节的限制。
TL;DR
这是一个有趣且富有启发性的问题。很高兴被问到。
最初问题似乎与 |>
和 TCO 的使用有关,因为它仍然有价值,所以我将其留在答案中。我还要感谢 OP 的回应和帮助,很高兴能帮助与你共事而不是反对你的人。
在下面的递归代码中,|>
在 Visual Studio 的调试模式下是 运行,它会导致 Whosebug。
如果它是从 bin\release
目录的命令行启动的,它不会导致 Whosebug。
使用 Visual Studio 15 个社区
[<EntryPoint>]
let main argv =
let largeList =
printfn "Creating large list"
[
for i in 1 .. 100000000 do
yield i
]
// causes Whosebug in Debug
// No Whosebug in Release
let sum4 l =
printfn "testing sum4"
let rec sumInner4 l acc =
match l with
| h::t ->
let acc = acc + h
acc |> sumInner4 t
| [] -> acc
sumInner4 l 0
let result4 = sum4 largeList
printfn "result4: %A" result4
在 Visual Studio 工具栏中设置 Release 或 Debug 的地方
调试模式下项目的选项是
发布模式下项目的选项是
tldr;
在测试这个过程中,我创建了 16 个不同的测试,并在调试和发布模式下构建它们,并验证它们是否 运行 完成或抛出堆栈溢出。这 16 个被分解为一组 4 个,每个 4 个案例。情况 1、5、9、13 是否定的并产生堆栈溢出以确保可以创建堆栈溢出。案例 2、6、10、14 是肯定的,表明尾调用正在工作并且没有导致堆栈溢出。案例 3、7、11、15 显示了一个尾部调用,其操作在与尾部调用相同的语句中完成,并且与使用 |>
的测试用例相差一个因式分解;这些按预期工作。案例 4、8、12、16 使用 |>
并显示它何时在调试模式下工作和不工作,这可能会让很多人感到惊讶。案例 1-4 和 9-12 使用 f x y
形式的函数,案例 8-11 使用 f x
形式的函数,案例 12-16 使用 f x y z
形式的函数].我最初做了前 8 个测试用例,但在 Keith 的评论之后又做了 4 个不使用列表但仍然使用 from f x y
的函数并呈现意外结果然后又做了 4 个使用函数的函数形式 f x y z
.
要进行 运行 测试,您必须注释掉除计划 运行 之外的所有测试,并在调试模式下构建一次,然后可以 运行从 Visual Studio 中,然后再次在发布模式下构建它并 运行 它。我从命令行 运行 确保我 运行 发布版本。
[<EntryPoint>]
let main argv =
let largeList =
printfn "Creating large list"
[
for i in 1 .. 100000000 do
yield i
]
// causes Whosebug in Debug
// causes Whosebug in Release
// Negative confirmation
// A supposed tail call that DOES cause a stack overflow in both debug and release mode
// options: f x y
let sum1 l =
printfn "test 01: "
let rec sum1Inner l acc =
match l with
| h::t ->
let acc = acc + h
1 + sum1Inner t acc
| [] -> acc
sum1Inner l 0
// No Whosebug in Debug
// No Whosebug in Release
// Positive confirmation
// A tail call that DOES NOT cause a stack overflow in both debug and release mode
// options: f x y
let sum2 l =
printfn "test 02: "
let rec sum2Inner l acc =
match l with
| h::t ->
let acc = acc + h
sum2Inner t acc
| [] -> acc
sum2Inner l 0
// No Whosebug in Debug
// No Whosebug in Release
// A test case
// options: f x y and no |>
let sum3 l =
printfn "test 03: "
let rec sum3Inner l acc =
match l with
| h::t ->
sum3Inner t (acc + h)
| [] -> acc
sum3Inner l 0
// causes Whosebug in Debug
// No Whosebug in Release
// A test case
// options: f x y and |>
let sum4 l =
printfn "test 04: "
let rec sum4Inner l acc =
match l with
| h::t ->
let acc = acc + h
acc |> sum4Inner t
| [] -> acc
sum4Inner l 0
// causes Whosebug in Debug
// causes Whosebug in Release
// Negative confirmation
// A supposed tail call that DOES cause a stack overflow in both debug and release mode
// options: f x
let sum5 () =
printfn "test 05: "
let rec sum5Inner x =
match x with
| 10000000 -> x
| _ ->
let acc = x + 1
1 + sum5Inner acc
sum5Inner 0
// No Whosebug in Debug
// No Whosebug in Release
// Positive confirmation
// A tail call that DOES NOT cause a stack overflow in both debug and release mode
// options: f x
let sum6 () =
printfn "test 06: "
let rec sum6Inner x =
match x with
| 10000000 -> x
| _ ->
let acc = x + 1
sum6Inner acc
sum6Inner 0
// No Whosebug in Debug
// No Whosebug in Release
// A test case
// options: f x and no |>
let sum7 l =
printfn "test 07: "
let rec sum7Inner x =
match x with
| 10000000 -> x
| _ -> sum7Inner (x + 1)
sum7Inner 0
// No Whosebug in Debug
// No Whosebug in Release
// A test case
// options: f x and |>
let sum8 () =
printfn "test 07: "
let rec sumInner8 x =
match x with
| 10000000 -> x
| _ ->
let acc = x + 1
acc |> sumInner8
sumInner8 0
// causes Whosebug in Debug
// causes Whosebug in Release
// Negative confirmation"
// A supposed tail call that DOES cause a stack overflow in both debug and release mode"
// options: f x y"
let sum9 () =
printfn "test 09: "
let rec sum9Inner x y =
match y with
| 10000000 -> y
| _ ->
let acc = x + y
1 + sum9Inner x acc
sum9Inner 1 0
// No Whosebug in Debug
// No Whosebug in Release
// Positive confirmation
// A tail call that DOES NOT cause a stack overflow in both debug and release mode
// options: f x y
let sum10 () =
printfn "test 10: "
let rec sum10Inner x y =
match y with
| 10000000 -> y
| _ ->
let acc = x + y
sum10Inner x acc
sum10Inner 1 0
// No Whosebug in Debug
// No Whosebug in Release
// A test case
// options: f x y and no |>
let sum11 () =
printfn "test 11: "
let rec sum11Inner x y =
match y with
| 10000000 -> y
| _ ->
sum11Inner x (x + y)
sum11Inner 1 0
// causes Whosebug in Debug
// No Whosebug in Release
// A test case
// options: f x y and |>
let sum12 () =
printfn "test 12: "
let rec sum12Inner x y =
match y with
| 10000000 -> y
| _ ->
let acc = x + y
acc |> sum12Inner x
sum12Inner 1 0
// causes Whosebug in Debug
// No Whosebug in Release
// A test case"
// options: f x y and |>"
let sum12 () =
printfn "test 12: "
let rec sum12Inner x y =
match y with
| 10000000 -> y
| _ ->
let acc = x + y
acc |> sum12Inner x
sum12Inner 1 0
// causes Whosebug in Debug
// causes Whosebug in Release
// Negative confirmation"
// A supposed tail call that DOES cause a stack overflow in both debug and release mode"
// options: f x y"
let sum13 () =
printfn "test 13: "
let rec sum13Inner x z y =
match y with
| 10000000 -> y
| _ ->
let acc = x + y
1 + sum13Inner x z acc
sum13Inner 1 "z" 0
// No Whosebug in Debug
// No Whosebug in Release
// Positive confirmation"
// A tail call that DOES NOT cause a stack overflow in both debug and release mode"
// options: f x y"
let sum14 () =
printfn "test 14: "
let rec sum14Inner x z y =
match y with
| 10000000 -> y
| _ ->
let acc = x + y
sum14Inner x z acc
sum14Inner 1 "z" 0
// No Whosebug in Debug
// No Whosebug in Release
// A test case"
// options: f x y and no |>"
let sum15 () =
printfn "test 15: "
let rec sum15Inner x z y =
match y with
| 10000000 -> y
| _ ->
sum15Inner x z (x + y)
sum15Inner 1 "z" 0
// causes Whosebug in Debug
// No Whosebug in Release
// A test case"
// options: f x y and |>"
let sum16 () =
printfn "test 16: "
let rec sum16Inner x z y =
match y with
| 10000000 -> y
| _ ->
let acc = x + y
acc |> sum16Inner x z
sum16Inner 1 "z" 0
let result1 = sum1 largeList
printfn "result1: %A" result1
let result2 = sum2 largeList
printfn "result2: %A" result2
let result3 = sum3 largeList
printfn "result3: %A" result3
let result4 = sum4 largeList
printfn "result4: %A" result4
let result5 = sum5 ()
printfn "result5: %A" result5
let result6 = sum6 ()
printfn "result6: %A" result6
let result7 = sum7 ()
printfn "result7: %A" result7
let result8 = sum8 ()
printfn "result8: %A" result8
let result9 = sum9 ()
printfn "result9: %A" result9
let result10 = sum10 ()
printfn "result10: %A" result10
let result11 = sum11 ()
printfn "result11: %A" result11
let result12 = sum12 ()
printfn "result12: %A" result12
let result13 = sum13 ()
printfn "result13: %A" result13
let result14 = sum14 ()
printfn "result14: %A" result14
let result15 = sum15 ()
printfn "result15: %A" result15
let result16 = sum16 ()
printfn "result16: %A" result16
printf "Press any key to exit: "
System.Console.ReadKey() |> ignore
printfn ""
0 // return an integer exit code
额外的新信息
编辑:This thread on Github has Don Syme, creator of F#, specifically mention that:
[...] Second, you are correct, we don't guarantee to optimize uses of
f <| x
orx |> f
or any similar to first-calling tailcalls even iff x
is a tailcall.