为什么 Frame.ofRecords 在输入由并行计算生成的序列时会出现乱码？

Question

我是运行一些计算记录序列并以该序列作为参数调用 Frame.ofRecords 的代码。记录是使用库 FSharp.Collections.ParallelSeq.

中的 PSeq.map 计算得出的

如果我将序列转换为列表，那么输出就可以了。这是代码和输出：

let summaryReport path (writeOpenPolicy: WriteOpenPolicy) (outputs: Output seq) =
    let foo (output: Output) =
        let temp =
            { Name          = output.Name
              Strategy      = string output.Strategy
              SharpeRatio   = (fst output.PandLStats).SharpeRatio
              CalmarRatio   = (fst output.PandLStats).CalmarRatio }
        printfn "************************************* %A" temp
        temp
    outputs
    |> Seq.map foo
    |> List.ofSeq // this is the line that makes a difference
    |> Frame.ofRecords
    |> frameToCsv path writeOpenPolicy ["Name"] "Summary_Statistics"


Name    Name        Strategy    SharpeRatio CalmarRatio
0   Singleton_AAPL  MyStrategy  0.317372564 0.103940018
1   Singleton_MSFT  MyStrategy  0.372516931 0.130150478
2   Singleton_IBM   MyStrategy              Infinity

printfn 命令让我通过检查来验证在每种情况下变量 temp 都被正确计算。最后一行代码只是 FrameExtensions.SaveCsv.

的包装

如果我删除 |> List.ofSeq 行，那么输出的内容就是乱码：

Name    Name        Strategy    SharpeRatio CalmarRatio
0   Singleton_IBM   MyStrategy  0.317372564 0.130150478
1   Singleton_MSFT  MyStrategy              0.103940018
2   Singleton_AAPL  MyStrategy  0.372516931 Infinity

注意空的（对应于NaN）和Infinity项现在在不同的行中，其他的东西也混在一起了。

为什么会这样？

Answer 1

并行序列运行是任意顺序的，因为它们被拆分到许多处理器上，因此结果集将是随机顺序的。您可以随时对它们进行排序，或者不运行并行处理您的数据。

Answer 2

Frame.ofRecords函数多次遍历序列，所以如果你的序列returns重复调用不同的数据，你会得到不一致的数据到帧中。

这是一个最小的例子：

let mutable n = 0.
let nums = seq { for i in 0 .. 10 do n <- n + 1.; yield n, n }

Frame.ofRecords nums

这个returns:

      Item1 Item2 
0  -> 1     12    
1  -> 2     13    
2  -> 3     14    
3  -> 4     15    
4  -> 5     16    
5  -> 6     17    
6  -> 7     18    
7  -> 8     19    
8  -> 9     20    
9  -> 10    21    
10 -> 11    22

如您所见，第一项是在序列的第一次迭代中获得的，而第二项是在第二次迭代中获得的。

这应该有更好的文档记录，但它在典型情况下会提高性能 - 如果您可以向文档发送 PR，那将很有用。

为什么 Frame.ofRecords 在输入由并行计算生成的序列时会出现乱码？

Why does Frame.ofRecords garbles its results when fed a sequence generated by a parallel calculation?

parallel-processing

f#

deedle