System.Numerics.Vector<T> .NET Framework 上的初始化性能
System.Numerics.Vector<T> Initialization Performance on .NET Framework
System.Numerics.Vector 为 .NET Core 和 .NET Framework 带来了 SIMD 支持。它适用于 .NET Framework 4.6+ 和 .NET Core。
// Baseline
public void SimpleSumArray()
{
for (int i = 0; i < left.Length; i++)
results[i] = left[i] + right[i];
}
// Using Vector<T> for SIMD support
public void SimpleSumVectors()
{
int ceiling = left.Length / floatSlots * floatSlots;
for (int i = 0; i < ceiling; i += floatSlots)
{
Vector<float> v1 = new Vector<float>(left, i);
Vector<float> v2 = new Vector<float>(right, i);
(v1 + v2).CopyTo(results, i);
}
for (int i = ceiling; i < left.Length; i++)
{
results[i] = left[i] + right[i];
}
}
不幸的是,Vector 的初始化可能是限制步骤。要解决此问题,多个来源建议使用 MemoryMarshal 将源数组转换为向量数组 [1][2]。例如:
// Improving Vector<T> Initialization Performance
public void SimpleSumVectorsNoCopy()
{
int numVectors = left.Length / floatSlots;
int ceiling = numVectors * floatSlots;
// leftMemory is simply a ReadOnlyMemory<float> referring to the "left" array
ReadOnlySpan<Vector<float>> leftVecArray = MemoryMarshal.Cast<float, Vector<float>>(leftMemory.Span);
ReadOnlySpan<Vector<float>> rightVecArray = MemoryMarshal.Cast<float, Vector<float>>(rightMemory.Span);
Span<Vector<float>> resultsVecArray = MemoryMarshal.Cast<float, Vector<float>>(resultsMemory.Span);
for (int i = 0; i < numVectors; i++)
resultsVecArray[i] = leftVecArray[i] + rightVecArray[i];
}
当 运行 在 .NET Core 上 :
时,这会显着提高性能
| Method | Mean | Error | StdDev |
|----------------------- |----------:|----------:|----------:|
| SimpleSumArray | 165.90 us | 0.1393 us | 0.1303 us |
| SimpleSumVectors | 53.69 us | 0.0473 us | 0.0443 us |
| SimpleSumVectorsNoCopy | 31.65 us | 0.1242 us | 0.1162 us |
不幸的是,在 .NET Framework 上,这种初始化向量的方式具有相反的效果。它实际上会导致更差的性能:
| Method | Mean | Error | StdDev |
|----------------------- |----------:|---------:|---------:|
| SimpleSumArray | 152.92 us | 0.128 us | 0.114 us |
| SimpleSumVectors | 52.35 us | 0.041 us | 0.038 us |
| SimpleSumVectorsNoCopy | 77.50 us | 0.089 us | 0.084 us |
有没有办法优化 Vector 在 .NET Framework 上的初始化并获得与 .NET Core 类似的性能?已使用此示例应用程序 [1].
执行了测量
[1] https://github.com/CBGonzalez/SIMDPerformance
[2]
据我所知,在 .NET Framework 4.6 或 4.7 中加载向量的唯一有效方法(大概这将在 5.0 中全部改变)是使用不安全的代码,例如使用 Unsafe.Read<Vector<float>>
(或其未匹配的变体(如果适用)):
public unsafe void SimpleSumVectors()
{
int ceiling = left.Length / floatSlots * floatSlots;
fixed (float* leftp = left, rightp = right, resultsp = results)
{
for (int i = 0; i < ceiling; i += floatSlots)
{
Unsafe.Write(resultsp + i,
Unsafe.Read<Vector<float>>(leftp + i) + Unsafe.Read<Vector<float>>(rightp + i));
}
}
for (int i = ceiling; i < left.Length; i++)
{
results[i] = left[i] + right[i];
}
}
这使用了可以通过 NuGet 获得的 System.Runtime.CompilerServices.Unsafe
包,但没有它也可以完成。
System.Numerics.Vector 为 .NET Core 和 .NET Framework 带来了 SIMD 支持。它适用于 .NET Framework 4.6+ 和 .NET Core。
// Baseline
public void SimpleSumArray()
{
for (int i = 0; i < left.Length; i++)
results[i] = left[i] + right[i];
}
// Using Vector<T> for SIMD support
public void SimpleSumVectors()
{
int ceiling = left.Length / floatSlots * floatSlots;
for (int i = 0; i < ceiling; i += floatSlots)
{
Vector<float> v1 = new Vector<float>(left, i);
Vector<float> v2 = new Vector<float>(right, i);
(v1 + v2).CopyTo(results, i);
}
for (int i = ceiling; i < left.Length; i++)
{
results[i] = left[i] + right[i];
}
}
不幸的是,Vector 的初始化可能是限制步骤。要解决此问题,多个来源建议使用 MemoryMarshal 将源数组转换为向量数组 [1][2]。例如:
// Improving Vector<T> Initialization Performance
public void SimpleSumVectorsNoCopy()
{
int numVectors = left.Length / floatSlots;
int ceiling = numVectors * floatSlots;
// leftMemory is simply a ReadOnlyMemory<float> referring to the "left" array
ReadOnlySpan<Vector<float>> leftVecArray = MemoryMarshal.Cast<float, Vector<float>>(leftMemory.Span);
ReadOnlySpan<Vector<float>> rightVecArray = MemoryMarshal.Cast<float, Vector<float>>(rightMemory.Span);
Span<Vector<float>> resultsVecArray = MemoryMarshal.Cast<float, Vector<float>>(resultsMemory.Span);
for (int i = 0; i < numVectors; i++)
resultsVecArray[i] = leftVecArray[i] + rightVecArray[i];
}
当 运行 在 .NET Core 上 :
时,这会显着提高性能| Method | Mean | Error | StdDev |
|----------------------- |----------:|----------:|----------:|
| SimpleSumArray | 165.90 us | 0.1393 us | 0.1303 us |
| SimpleSumVectors | 53.69 us | 0.0473 us | 0.0443 us |
| SimpleSumVectorsNoCopy | 31.65 us | 0.1242 us | 0.1162 us |
不幸的是,在 .NET Framework 上,这种初始化向量的方式具有相反的效果。它实际上会导致更差的性能:
| Method | Mean | Error | StdDev |
|----------------------- |----------:|---------:|---------:|
| SimpleSumArray | 152.92 us | 0.128 us | 0.114 us |
| SimpleSumVectors | 52.35 us | 0.041 us | 0.038 us |
| SimpleSumVectorsNoCopy | 77.50 us | 0.089 us | 0.084 us |
有没有办法优化 Vector 在 .NET Framework 上的初始化并获得与 .NET Core 类似的性能?已使用此示例应用程序 [1].
执行了测量[1] https://github.com/CBGonzalez/SIMDPerformance
[2]
据我所知,在 .NET Framework 4.6 或 4.7 中加载向量的唯一有效方法(大概这将在 5.0 中全部改变)是使用不安全的代码,例如使用 Unsafe.Read<Vector<float>>
(或其未匹配的变体(如果适用)):
public unsafe void SimpleSumVectors()
{
int ceiling = left.Length / floatSlots * floatSlots;
fixed (float* leftp = left, rightp = right, resultsp = results)
{
for (int i = 0; i < ceiling; i += floatSlots)
{
Unsafe.Write(resultsp + i,
Unsafe.Read<Vector<float>>(leftp + i) + Unsafe.Read<Vector<float>>(rightp + i));
}
}
for (int i = ceiling; i < left.Length; i++)
{
results[i] = left[i] + right[i];
}
}
这使用了可以通过 NuGet 获得的 System.Runtime.CompilerServices.Unsafe
包,但没有它也可以完成。