C++（LAPACK，sgels）和Python（Numpy，lstsq）结果的区别

The difference between C++ (LAPACK, sgels) and Python (Numpy, lstsq) results

我正在比较 C++ 和 Python 计算的数值结果。在 C++ 中，我利用 LAPACK 的 sgels 函数来计算线性回归问题的系数。在 Python 中，我使用 Numpy 的 linalg.lstsq 函数来完成类似的任务。

sgels 和 linalg.lstsq 使用的方法在数学上有什么区别？

在数值上比较结果（即回归系数）时，预期误差是多少（例如 6 位有效数字）？

仅供参考：我绝不是 C++ 或 Python 专家，因此很难理解函数内部发生的事情。

查看 numpy 的源代码，在文件 linalg.py 中，lstsq 依赖于 LAPACK 的 zgelsd() 进行复数，dgelsd() 进行实数。以下是与 sgels() 的区别：

dgelsd() 用于 double 而 sgels() 用于 float。有精度差异...
dgels() 使用矩阵 A 的 QR 因式分解并假设 A 具有满秩。矩阵的条件数必须合理才能得到显着的结果。逻辑参见 this course for getting the logic of the method. On the other hand, dgelsd() makes use of the Singular value decomposition of A. In particular, A can be rank defiencient and small singular values are discarted depending of the additional argument rcond or machine precision. Notice that numpy's default value for rcond is -1: negative values refers to machine precision. See this course。
根据 benchmark of LAPACK，预计 dgels() 比 dgelsd() 快 5 倍。

如果矩阵条件不佳，您可能会发现 sgels() 和 dgelsd() 的结果存在显着差异。实际上，线性回归的误差存在界限，这取决于算法和所使用的 rcond() 的值。有关技术详细信息，请参阅 the user guide of LAPACK on, Error Bounds for Linear Least Squares Problems for estimates of the errors and Further Details: Error Bounds for Linear Least Squares Problems。

作为结论，如果b中的测量准确且容易与解释变量相关，则可以使用sgels()和dgels()。例如，如果将传感器放置在排气管的出口处，则很容易猜出哪些电机是运行。但有时，源和测量之间的线性 link 并不准确知道（A 项的不确定性）或基于测量区分污染者变得更加困难（一些污染者远离传感器组和A 是 ill-conditionned)。在这种情况下，dgelsd() 和调整 rcond 参数会有所帮助。 如有疑问，请使用dgelsd()并根据LAPACK's user guide估计x的误差。

C++（LAPACK，sgels）和Python（Numpy，lstsq）结果的区别

The difference between C++ (LAPACK, sgels) and Python (Numpy, lstsq) results

c++

python

numpy

lapack