性能数组乘法 Pearson
Performance array multiplication Pearson
我多次计算 Pearson correlation(平均 user/item 评分),使用我当前的代码性能非常糟糕:
public double ComputeCorrelation(double[] x, double[] y, double[] meanX, double[] meanY)
{
if (x.Length != y.Length)
throw new ArgumentException("values must be the same length");
double sumNum = 0;
double sumDenom = 0;
double denomX = 0;
double denomY = 0;
for (int a = 0; a < x.Length; a++)
{
sumNum += (x[a] - meanX[a]) * (y[a] - meanY[a]);
denomX += Math.Pow(x[a] - meanX[a], 2);
denomY += Math.Pow(y[a] - meanY[a], 2);
}
var sqrtDenomX = Math.Sqrt(denomX);
var sqrtDenomY = Math.Sqrt(denomY);
if (sqrtDenomX == 0 || sqrtDenomY == 0) return 0;
sumDenom = Math.Sqrt(denomX) * Math.Sqrt(denomY);
var correlation = sumNum / sumDenom;
return correlation;
}
我正在使用 MathNet.Numerics
的标准 Pearson 相关系数,但这是对标准的修改,无法使用它。有没有办法加快速度?如何优化时间复杂度?
解决性能问题的最佳方法可能是尽可能避免计算相关性。如果您将相关性用作另一项计算的一部分,则可以使用数学来消除对其中某些相关性的需求。
您还应该考虑是否能够使用 Pearson 相关系数的平方而不是 Pearson 相关系数本身。这样,您就可以节省对 Math.Sqrt()
的调用,这通常非常昂贵。
如果您确实需要求平方根,您应该再次使用 sqrtDenomX
和 sqrtDenomY
,而不是重新计算平方根。
我在您的代码中看到的唯一可能的优化是在以下代码中,如果您仍在寻找更好的性能,那么您可能需要使用 SIMD vectorization。它将允许您使用 CPU
的全部计算能力
public double ComputeCorrelation(double[] x, double[] y, double[] meanX, double[] meanY)
{
if (x.Length != y.Length)
throw new ArgumentException("values must be the same length");
double sumNum = 0;
double sumDenom = 0;
double denomX = 0;
double denomY = 0;
double diffX;
double diffY;
for (int a = 0; a < x.Length; a++)
{
diffX = (x[a] - meanX[a]);
diffY = (y[a] - meanY[a]);
sumNum += diffX * diffY;
denomX += diffX * diffX;
denomY += diffY * diffY;
}
var sqrtDenomX = Math.Sqrt(denomX);
var sqrtDenomY = Math.Sqrt(denomY);
if (sqrtDenomX == 0 || sqrtDenomY == 0) return 0;
sumDenom = sqrtDenomX * sqrtDenomY;
var correlation = sumNum / sumDenom;
return correlation;
}
添加一些关于 MSE 的答案——将 Pow(x,2)
更改为 diff*diff
绝对是您想要做的事情,您可能还想避免在最内层循环中进行不必要的边界检查。这可以使用 pointers in C# 来完成。
可以这样做:
public unsafe double ComputeCorrelation(double[] x, double[] y, double[] meanX, double[] meanY)
{
if (x.Length != y.Length)
throw new ArgumentException("values must be the same length");
double sumNum = 0;
double sumDenom = 0;
double denomX = 0;
double denomY = 0;
double diffX;
double diffY;
int len = x.Length;
fixed (double* xptr = &x[0], yptr = &y[0], meanXptr = &meanX[0], meanYptr = &meanY[0])
{
for (int a = 0; a < len; a++)
{
diffX = (xptr[a] - meanXptr[a]);
diffY = (yptr[a] - meanYptr[a]);
sumNum += diffX * diffY;
denomX += diffX * diffX;
denomY += diffY * diffY;
}
}
var sqrtDenomX = Math.Sqrt(denomX);
var sqrtDenomY = Math.Sqrt(denomY);
if (sqrtDenomX == 0 || sqrtDenomY == 0) return 0;
sumDenom = sqrtDenomX * sqrtDenomY;
var correlation = sumNum / sumDenom;
return correlation;
}
我多次计算 Pearson correlation(平均 user/item 评分),使用我当前的代码性能非常糟糕:
public double ComputeCorrelation(double[] x, double[] y, double[] meanX, double[] meanY)
{
if (x.Length != y.Length)
throw new ArgumentException("values must be the same length");
double sumNum = 0;
double sumDenom = 0;
double denomX = 0;
double denomY = 0;
for (int a = 0; a < x.Length; a++)
{
sumNum += (x[a] - meanX[a]) * (y[a] - meanY[a]);
denomX += Math.Pow(x[a] - meanX[a], 2);
denomY += Math.Pow(y[a] - meanY[a], 2);
}
var sqrtDenomX = Math.Sqrt(denomX);
var sqrtDenomY = Math.Sqrt(denomY);
if (sqrtDenomX == 0 || sqrtDenomY == 0) return 0;
sumDenom = Math.Sqrt(denomX) * Math.Sqrt(denomY);
var correlation = sumNum / sumDenom;
return correlation;
}
我正在使用 MathNet.Numerics
的标准 Pearson 相关系数,但这是对标准的修改,无法使用它。有没有办法加快速度?如何优化时间复杂度?
解决性能问题的最佳方法可能是尽可能避免计算相关性。如果您将相关性用作另一项计算的一部分,则可以使用数学来消除对其中某些相关性的需求。
您还应该考虑是否能够使用 Pearson 相关系数的平方而不是 Pearson 相关系数本身。这样,您就可以节省对 Math.Sqrt()
的调用,这通常非常昂贵。
如果您确实需要求平方根,您应该再次使用 sqrtDenomX
和 sqrtDenomY
,而不是重新计算平方根。
我在您的代码中看到的唯一可能的优化是在以下代码中,如果您仍在寻找更好的性能,那么您可能需要使用 SIMD vectorization。它将允许您使用 CPU
的全部计算能力public double ComputeCorrelation(double[] x, double[] y, double[] meanX, double[] meanY)
{
if (x.Length != y.Length)
throw new ArgumentException("values must be the same length");
double sumNum = 0;
double sumDenom = 0;
double denomX = 0;
double denomY = 0;
double diffX;
double diffY;
for (int a = 0; a < x.Length; a++)
{
diffX = (x[a] - meanX[a]);
diffY = (y[a] - meanY[a]);
sumNum += diffX * diffY;
denomX += diffX * diffX;
denomY += diffY * diffY;
}
var sqrtDenomX = Math.Sqrt(denomX);
var sqrtDenomY = Math.Sqrt(denomY);
if (sqrtDenomX == 0 || sqrtDenomY == 0) return 0;
sumDenom = sqrtDenomX * sqrtDenomY;
var correlation = sumNum / sumDenom;
return correlation;
}
添加一些关于 MSE 的答案——将 Pow(x,2)
更改为 diff*diff
绝对是您想要做的事情,您可能还想避免在最内层循环中进行不必要的边界检查。这可以使用 pointers in C# 来完成。
可以这样做:
public unsafe double ComputeCorrelation(double[] x, double[] y, double[] meanX, double[] meanY)
{
if (x.Length != y.Length)
throw new ArgumentException("values must be the same length");
double sumNum = 0;
double sumDenom = 0;
double denomX = 0;
double denomY = 0;
double diffX;
double diffY;
int len = x.Length;
fixed (double* xptr = &x[0], yptr = &y[0], meanXptr = &meanX[0], meanYptr = &meanY[0])
{
for (int a = 0; a < len; a++)
{
diffX = (xptr[a] - meanXptr[a]);
diffY = (yptr[a] - meanYptr[a]);
sumNum += diffX * diffY;
denomX += diffX * diffX;
denomY += diffY * diffY;
}
}
var sqrtDenomX = Math.Sqrt(denomX);
var sqrtDenomY = Math.Sqrt(denomY);
if (sqrtDenomX == 0 || sqrtDenomY == 0) return 0;
sumDenom = sqrtDenomX * sqrtDenomY;
var correlation = sumNum / sumDenom;
return correlation;
}