多变量线性回归模型的预测截距和系数
Predicting intercept and coefficient for linear regression model for multiple variable
我有以下等式:
P = B0 + B1*Var1 + B2*Var2
我有 P
、Var1
和 Var2
的值。我试着对此建模,然后计算系数和截距。
下面是代码和我得到的输出:
P = [1035.89, 1060.4, 1064, 1075.89, 1078.69, 1074.93, 1090.71, 1080.95, 1086.19,1080.46] # Total power
l = [51.275510204081634, 102.89115646258503, 160.7142857142857, 205.78231292517006, 256.80272108843536, 307.82312925170066, 360.5442176870748, 409.0136054421768, 460.03401360544217, 492.3469387755102]
t = [6.110918671507064, 12.262374116954474, 19.153625686813186, 24.524748233908948, 30.60526432496075, 36.685780416012555, 42.96898037676609, 48.7454706632653, 54.82598675431711, 58.67698027864992]
X = []
for index in range(0,len(P)):
row = []
row.append(t[index])
row.append(l[index])
X.append(row)
print "Using statsmodels"
import statsmodels.api as sm
X = sm.add_constant(X)
est = sm.OLS(P, X).fit()
print est.params[0]
print est.params[1]
print est.params[2]
我得到的结果是:
Using statsmodels
1048.32518503
0.0102496334198
0.0860026475829
这是正确的吗? est.params[0]
是指等式的 B0
吗?
当我 运行 实验时,我得到 B0
在 600-650
范围内?
这个数据会因为数据错误而不匹配吗?
我不熟悉statsmodels
,但这里有一个使用curve_fit
的实现(见下面的代码)。模型预测与您观察到的实验结果不匹配的原因在我看来是您的模型(B0 + B1*Var1 + B2*Var2
)没有很好地描述数据(exponential/log/sqrt 可能会更好)。在接下来的图中,我显示了原始数据、通过 curve_fit
(下面的代码)获得的拟合以及使用您的参数的拟合。
如您所见,两个拟合函数给出相同的结果,但是,我认为您的数据应该由另一个函数建模。如果我有时间,我会寻找更适合您数据的函数。
from scipy.optimize import curve_fit
import numpy as np
import matplotlib.pyplot as plt
P = [1035.89, 1060.4, 1064, 1075.89, 1078.69, 1074.93, 1090.71, 1080.95, 1086.19,1080.46] # Total power
l = [51.275510204081634, 102.89115646258503, 160.7142857142857, 205.78231292517006, 256.80272108843536, 307.82312925170066, 360.5442176870748, 409.0136054421768, 460.03401360544217, 492.3469387755102]
t = [6.110918671507064, 12.262374116954474, 19.153625686813186, 24.524748233908948, 30.60526432496075, 36.685780416012555, 42.96898037676609, 48.7454706632653, 54.82598675431711, 58.67698027864992]
# your model
def func(x, b0, b1, b2):
var1, var2 = x
return b0 + np.dot(b1, var1) + np.dot(b2, var2)
# Curve fit
coeff, _ = curve_fit(func, (l, t), P)
b0, b1, b2 = coeff[0], coeff[1], coeff[2]
print b0, b1, b2
# plot the data
xval = range(1 ,len(P)+1)
plt.scatter(xval, P, s=30, marker = "v", label='P')
plt.scatter(xval, func((l,t), *coeff), s=30, marker = "v", color="red", label='curvefit')
plt.legend(loc='upper left')
plt.figure()
plt.scatter(xval, P, s=30, marker = "v", label='P')
plt.scatter(xval, func((l, t), 1048.32518503, 0.0860026475829, 0.0102496334198 ), s=30, marker = "v",color="black",label='your parameter')
plt.legend(loc='upper left')
plt.show()
print "residuals curve_fit:",((P - func((l,t), *coeff))**2).sum()
print "residuals stats:",((P - func((l,t), 1048.32518503,0.086002647582,0.0102496334198))**2).sum()
我有以下等式:
P = B0 + B1*Var1 + B2*Var2
我有 P
、Var1
和 Var2
的值。我试着对此建模,然后计算系数和截距。
下面是代码和我得到的输出:
P = [1035.89, 1060.4, 1064, 1075.89, 1078.69, 1074.93, 1090.71, 1080.95, 1086.19,1080.46] # Total power
l = [51.275510204081634, 102.89115646258503, 160.7142857142857, 205.78231292517006, 256.80272108843536, 307.82312925170066, 360.5442176870748, 409.0136054421768, 460.03401360544217, 492.3469387755102]
t = [6.110918671507064, 12.262374116954474, 19.153625686813186, 24.524748233908948, 30.60526432496075, 36.685780416012555, 42.96898037676609, 48.7454706632653, 54.82598675431711, 58.67698027864992]
X = []
for index in range(0,len(P)):
row = []
row.append(t[index])
row.append(l[index])
X.append(row)
print "Using statsmodels"
import statsmodels.api as sm
X = sm.add_constant(X)
est = sm.OLS(P, X).fit()
print est.params[0]
print est.params[1]
print est.params[2]
我得到的结果是:
Using statsmodels
1048.32518503
0.0102496334198
0.0860026475829
这是正确的吗? est.params[0]
是指等式的 B0
吗?
当我 运行 实验时,我得到 B0
在 600-650
范围内?
这个数据会因为数据错误而不匹配吗?
我不熟悉statsmodels
,但这里有一个使用curve_fit
的实现(见下面的代码)。模型预测与您观察到的实验结果不匹配的原因在我看来是您的模型(B0 + B1*Var1 + B2*Var2
)没有很好地描述数据(exponential/log/sqrt 可能会更好)。在接下来的图中,我显示了原始数据、通过 curve_fit
(下面的代码)获得的拟合以及使用您的参数的拟合。
如您所见,两个拟合函数给出相同的结果,但是,我认为您的数据应该由另一个函数建模。如果我有时间,我会寻找更适合您数据的函数。
from scipy.optimize import curve_fit
import numpy as np
import matplotlib.pyplot as plt
P = [1035.89, 1060.4, 1064, 1075.89, 1078.69, 1074.93, 1090.71, 1080.95, 1086.19,1080.46] # Total power
l = [51.275510204081634, 102.89115646258503, 160.7142857142857, 205.78231292517006, 256.80272108843536, 307.82312925170066, 360.5442176870748, 409.0136054421768, 460.03401360544217, 492.3469387755102]
t = [6.110918671507064, 12.262374116954474, 19.153625686813186, 24.524748233908948, 30.60526432496075, 36.685780416012555, 42.96898037676609, 48.7454706632653, 54.82598675431711, 58.67698027864992]
# your model
def func(x, b0, b1, b2):
var1, var2 = x
return b0 + np.dot(b1, var1) + np.dot(b2, var2)
# Curve fit
coeff, _ = curve_fit(func, (l, t), P)
b0, b1, b2 = coeff[0], coeff[1], coeff[2]
print b0, b1, b2
# plot the data
xval = range(1 ,len(P)+1)
plt.scatter(xval, P, s=30, marker = "v", label='P')
plt.scatter(xval, func((l,t), *coeff), s=30, marker = "v", color="red", label='curvefit')
plt.legend(loc='upper left')
plt.figure()
plt.scatter(xval, P, s=30, marker = "v", label='P')
plt.scatter(xval, func((l, t), 1048.32518503, 0.0860026475829, 0.0102496334198 ), s=30, marker = "v",color="black",label='your parameter')
plt.legend(loc='upper left')
plt.show()
print "residuals curve_fit:",((P - func((l,t), *coeff))**2).sum()
print "residuals stats:",((P - func((l,t), 1048.32518503,0.086002647582,0.0102496334198))**2).sum()