使用 ReadOnlySpan<T> 对二维数组进行不安全切片

Unsafe slicing a 2D array with ReadOnlySpan<T>

在 .Net Core 3.1 上,我有许多大型二维数组,我需要在其中对数组中单行的一部分进行操作。同一个切片可能被多个操作使用,所以我想只执行一次切片并重复使用切片。

下面的示例代码对数组进行切片,然后调用 2 个函数对切片进行操作。

public void MyFunc()
{
    double[,] array = ...;  // populate the array

    // select which part of the array to slice, values not important
    int index0 = 0;
    int startIndex1 = 1;
    int sliceLength = 2;

    // slice the array
    ReadOnlySpan<double> slice = Slice(array, index0, startIndex1, sliceLength);

    // do things with the slice
    DoSomething1(slice);
    DoSomething2(slice);
}

public unsafe ReadOnlySpan<double> Slice(double[,] array, int index0, int startIndex1, int sliceLength)
{
    int arrayLength = array.GetLength(0) * array.GetLength(1);
    int arrayStartIndex = index0 * array.GetLength(1) + startIndex1;
    ReadOnlySpan<double> slice;
    fixed (double* arrayPtr = array)
    {
        slice = new ReadOnlySpan<double>(arrayPtr, arrayLength).Slice(arrayStartIndex, sliceLength);
    }

    // does it matter if slice is returned inside or outside of the fixed block?
    return slice;
}

public void DoSomething1(ReadOnlySpan<double> slice)
{
    ...
}

public void DoSomething2(ReadOnlySpan<double> slice)
{
    ...
}

“固定”确保 GC 在我创建“切片”时不会移动“数组”。创建“切片”后,如果 GC 移动“数组”,它是否也会更新“切片”以引用新的“数组”地址,或者“切片”仍会引用旧地址?换句话说,DoSomething1(...) 和 DoSomething2(...) 会始终对原始数组的预期切片进行操作,还是会无意中对随机内存块进行操作?

此外,“return slice;”是否重要是在“固定”区块的内部还是外部?

编辑 在 的启发下,我设法编写了一个测试来证明 V0ldek 关于 GC 在移动父数组时更新 ReadOnlySpan 的地址是正确的。

public static unsafe void ReadOnlySpanTest()
{
    // create 2D array
    double[,] array = new double[,] { {1, 2, 3}, {4, 5, 6} };

    // parameters to convert 2D array to 1D span
    int arrayLength = array.GetLength(0) * array.GetLength(1);
    int sliceStartIndex = 1;
    int sliceLength = 2;

    // create span
    IntPtr arrayAddressBeforeMove;
    ReadOnlySpan<double> spanFromPointer;
    fixed (double* arrayPtr = array)
    {
        arrayAddressBeforeMove = (IntPtr)arrayPtr;

        // spanFromPointer should contain { 2, 3 }
        spanFromPointer = new ReadOnlySpan<double>(arrayPtr, arrayLength).Slice(sliceStartIndex, sliceLength);
    }

    // trick GC into moving the array
    GC.AddMemoryPressure(10000000);
    GC.Collect();
    GC.RemoveMemoryPressure(10000000);

    // check array address and span contents again
    IntPtr arrayAddressAfterMove;
    fixed (double* arrayPtr = array)
    {
        // arrayAddressAfterMove should be different from arrayAddressBeforeMove
        arrayAddressAfterMove = (IntPtr) arrayPtr;

        // spanFromPointer should still contain { 2, 3 }
    }
}

在调试器中跳过 ReadOnlySpanTest(),我可以看到 arrayAddressAfterMove != arrayAddressBeforeMove,表明 GC 确实移动了我的数组。我还可以看到 spanFromPointer 在移动数组之前和之后都包含 { 2, 3 }。因此,ReadOnlySpan 是使用“固定”块创建的并不重要,在离开“固定”块后仍然可以安全地使用它。

创建 Span<T>ReadOnlySpan<T>Memory<T> 后,所有后续使用都是安全的。

Here's a reference by Stephen Toub.

First, Span is a value type containing a ref and a length, defined approximately as follows:

public readonly ref struct Span<T>
{
  private readonly ref T _pointer;
  private readonly int _length;
  ...
}

The concept of a ref T field may be strange at first—in fact, one can’t actually declare a ref T field in C# or even in MSIL. But Span is actually written to use a special internal type in the runtime that’s treated as a just-in-time (JIT) intrinsic, with the JIT generating for it the equivalent of a ref T field.

Span is a ref-like type as it contains a ref field, and ref fields can refer not only to the beginning of objects like arrays, but also to the middle of them (...) These references are called interior pointers, and tracking them is a relatively expensive operation for the .NET runtime’s garbage collector. As such, the runtime constrains these refs to only live on the stack, as it provides an implicit low limit on the number of interior pointers that might be in existence.

所以 GC 实际上确实跟踪了来自您 ReadOnlySpan<T> 的指针,因此在构建 span 之后总是安全的。跨度将始终指向您切片的数组,并且 return 它在哪里并不重要。 如何具体实现细节是 CLR 特有的。要搜索的关键字是“托管指针”和“内部指针”。如果你想获得更多 nitty-gritty,我推荐 this article

您是否考虑过使用 Microsoft.Data.Analysis Nuget 包?用数据填充 DataFrame df 后,获取一行(相当于您的 Slice 方法)就像 df.Rows[rowIndex] 一样简单。要访问返回行中的每个值,您可以再次使用索引器:df.Rows[rowIndex][columnIndex].