Numpy 的最小二乘没有残差
No Residuals With Numpy's Least Squares
我正在尝试计算一个 least squares problem in Numpy (i.e. Ordinary Least Squares (OLS) with Simple Regression) in order to find the corresponding R² value. However, in some cases, Numpy is returning an empty list for the residuals. Take the following over-determined 示例(即比未知数更多的方程式)来说明这个问题:
(注:有无常数因子(即截距)(即初始列向量全为1),故未居中 将使用总平方和 (TSS)。)
import numpy as np
A = np.array([[6, 6, 3], [40, 40, 20]]).T
y = np.array([0.5, 0.2, 0.6])
model_parameters, residuals, rank, singular_values = np.linalg.lstsq(A, y, rcond=None)
# No Intercept, therefore use Uncentered Total Sum of Squares (TSS)
uncentered_tss = np.sum((y)**2)
numpy_r2 = 1.0 - residuals / uncentered_tss
print("Numpy Model Parameter(s): " + str(model_parameters))
print("Numpy Sum of Squared Residuals (SSR): " + str(residuals))
print("Numpy R²: " + str(numpy_r2))
以下产生以下输出:
Numpy Model Parameter(s): [0.00162999 0.01086661]
Numpy Sum of Squared Residuals (SSR): []
Numpy R²: []
... residuals will be empty when the equations are under-determined or well-determined but return values when they are over-determined.
然而,这个问题显然是多定的(3 个方程 vs. 2 个未知数)。我什至可以证明残差(以及 sum of squared residuals (SSR)) exist by computing the regression results given by the statsmodels's OLS function:
import statsmodels.api as sm
A = np.array([[6, 6, 3], [40, 40, 20]]).T
y = np.array([0.5, 0.2, 0.6])
statsmodel_model = sm.OLS(y, A)
regression_results = statsmodels_model.fit()
calculated_r_squared = 1.0 - regression_results.ssr / np.sum((y)**2)
print("Parameters: " + str(regression_results.params))
print("Residuals: " + str(regression_results.resid))
print("Statsmodels R²: " + str(regression_results.rsquared))
print("Manually Calculated R²: " + str(calculated_r_squared))
以下产生以下输出:
Parameters: [0.00162999 0.01086661]
Residuals: [ 0.05555556 -0.24444444 0.37777778]
Statsmodels R²: 0.6837606837606838
Manually Calculated R²: 0.6837606837606838
(如您所见,Statsmodels 和 Numpy 模型的参数一致。)
为什么 Numpy return 使用以下示例的空 SSR 数组?这是 numpy.linalg.lstsq? If this is not a bug, then why is Statsmodels able to compute the sum of squared residuals (SSR) 的错误而 numpy 不是吗?给定最佳拟合平面,也可以手动清楚地计算残差:
来自 numpy.linalg.lstsq()
的文档:
residuals : {(), (1,), (K,)} ndarray
... If the rank of a is < N
or M <= N
, this is an empty array. ...
你的矩阵的秩为 1。
注意: 您认为“丢失”的残差也可以使用 numpy
找到(您不需要其他包):
residuals = y - np.dot(A, model_parameters)
我正在尝试计算一个 least squares problem in Numpy (i.e. Ordinary Least Squares (OLS) with Simple Regression) in order to find the corresponding R² value. However, in some cases, Numpy is returning an empty list for the residuals. Take the following over-determined 示例(即比未知数更多的方程式)来说明这个问题:
(注:有无常数因子(即截距)(即初始列向量全为1),故未居中 将使用总平方和 (TSS)。)
import numpy as np
A = np.array([[6, 6, 3], [40, 40, 20]]).T
y = np.array([0.5, 0.2, 0.6])
model_parameters, residuals, rank, singular_values = np.linalg.lstsq(A, y, rcond=None)
# No Intercept, therefore use Uncentered Total Sum of Squares (TSS)
uncentered_tss = np.sum((y)**2)
numpy_r2 = 1.0 - residuals / uncentered_tss
print("Numpy Model Parameter(s): " + str(model_parameters))
print("Numpy Sum of Squared Residuals (SSR): " + str(residuals))
print("Numpy R²: " + str(numpy_r2))
以下产生以下输出:
Numpy Model Parameter(s): [0.00162999 0.01086661]
Numpy Sum of Squared Residuals (SSR): []
Numpy R²: []
... residuals will be empty when the equations are under-determined or well-determined but return values when they are over-determined.
然而,这个问题显然是多定的(3 个方程 vs. 2 个未知数)。我什至可以证明残差(以及 sum of squared residuals (SSR)) exist by computing the regression results given by the statsmodels's OLS function:
import statsmodels.api as sm
A = np.array([[6, 6, 3], [40, 40, 20]]).T
y = np.array([0.5, 0.2, 0.6])
statsmodel_model = sm.OLS(y, A)
regression_results = statsmodels_model.fit()
calculated_r_squared = 1.0 - regression_results.ssr / np.sum((y)**2)
print("Parameters: " + str(regression_results.params))
print("Residuals: " + str(regression_results.resid))
print("Statsmodels R²: " + str(regression_results.rsquared))
print("Manually Calculated R²: " + str(calculated_r_squared))
以下产生以下输出:
Parameters: [0.00162999 0.01086661]
Residuals: [ 0.05555556 -0.24444444 0.37777778]
Statsmodels R²: 0.6837606837606838
Manually Calculated R²: 0.6837606837606838
(如您所见,Statsmodels 和 Numpy 模型的参数一致。)
为什么 Numpy return 使用以下示例的空 SSR 数组?这是 numpy.linalg.lstsq? If this is not a bug, then why is Statsmodels able to compute the sum of squared residuals (SSR) 的错误而 numpy 不是吗?给定最佳拟合平面,也可以手动清楚地计算残差:
来自 numpy.linalg.lstsq()
的文档:
residuals : {(), (1,), (K,)} ndarray
... If the rank of a is
< N
orM <= N
, this is an empty array. ...
你的矩阵的秩为 1。
注意: 您认为“丢失”的残差也可以使用 numpy
找到(您不需要其他包):
residuals = y - np.dot(A, model_parameters)