使用“statsmodels”绘制屏蔽值的残差
Plotting residuals of masked values with `statsmodels`
我正在使用 statsmodels.api
计算两个变量之间 OLS 拟合的统计参数:
def computeStats(x, y, yName):
'''
Takes as an argument an array, and a string for the array name.
Uses Ordinary Least Squares to compute the statistical parameters for the
array against log(z), and determines the equation for the line of best fit.
Returns the results summary, residuals, statistical parameters in a list, and the
best fit equation.
'''
# Mask NaN values in both axes
mask = ~np.isnan(y) & ~np.isnan(x)
# Compute model parameters
model = sm.OLS(y, sm.add_constant(x), missing= 'drop')
results = model.fit()
residuals = results.resid
# Compute fit parameters
params = stats.linregress(x[mask], y[mask])
fit = params[0]*x + params[1]
fitEquation = '$(%s)=(%.4g \pm %.4g) \times redshift+%.4g$'%(yName,
params[0], # slope
params[4], # stderr in slope
params[1]) # y-intercept
return results, residuals, params, fit, fitEquation
函数的第二部分(使用 stats.linregress
)可以很好地处理屏蔽值,但 statsmodels
不能。当我尝试使用 plt.scatter(x, resids)
绘制针对 x 值的残差时,尺寸不匹配:
ValueError: x and y must be the same size
因为有 29007 个 x 值和 11763 个残差(这是通过屏蔽过程获得的 y 值的数量)。我尝试将 model
变量更改为
model = sm.OLS(y[mask], sm.add_constant(x[mask]), missing= 'drop')
但这没有效果。
如何根据它们匹配的 x 值散点图绘制残差?
您好@jim421616 由于 statsmodels 丢失了一些缺失值,您应该使用模型的 exog 变量绘制散点图,如图所示。
plt.scatter(model.model.exog[:,1], model.resid)
参考一个完整的虚拟示例
import numpy as np
import statsmodels.api as sm
import matplotlib.pyplot as plt
#generate data
x = np.random.rand(1000)
y =np.sin( x*25)+0.1*np.random.rand(1000)
# Make some as NAN
y[np.random.choice(np.arange(1000), size=100)]= np.nan
x[np.random.choice(np.arange(1000), size=80)]= np.nan
# fit model
model = sm.OLS(y, sm.add_constant(x) ,missing='drop').fit()
print model.summary()
# plot
plt.scatter(model.model.exog[:,1], model.resid)
plt.show()
我正在使用 statsmodels.api
计算两个变量之间 OLS 拟合的统计参数:
def computeStats(x, y, yName):
'''
Takes as an argument an array, and a string for the array name.
Uses Ordinary Least Squares to compute the statistical parameters for the
array against log(z), and determines the equation for the line of best fit.
Returns the results summary, residuals, statistical parameters in a list, and the
best fit equation.
'''
# Mask NaN values in both axes
mask = ~np.isnan(y) & ~np.isnan(x)
# Compute model parameters
model = sm.OLS(y, sm.add_constant(x), missing= 'drop')
results = model.fit()
residuals = results.resid
# Compute fit parameters
params = stats.linregress(x[mask], y[mask])
fit = params[0]*x + params[1]
fitEquation = '$(%s)=(%.4g \pm %.4g) \times redshift+%.4g$'%(yName,
params[0], # slope
params[4], # stderr in slope
params[1]) # y-intercept
return results, residuals, params, fit, fitEquation
函数的第二部分(使用 stats.linregress
)可以很好地处理屏蔽值,但 statsmodels
不能。当我尝试使用 plt.scatter(x, resids)
绘制针对 x 值的残差时,尺寸不匹配:
ValueError: x and y must be the same size
因为有 29007 个 x 值和 11763 个残差(这是通过屏蔽过程获得的 y 值的数量)。我尝试将 model
变量更改为
model = sm.OLS(y[mask], sm.add_constant(x[mask]), missing= 'drop')
但这没有效果。
如何根据它们匹配的 x 值散点图绘制残差?
您好@jim421616 由于 statsmodels 丢失了一些缺失值,您应该使用模型的 exog 变量绘制散点图,如图所示。
plt.scatter(model.model.exog[:,1], model.resid)
参考一个完整的虚拟示例
import numpy as np
import statsmodels.api as sm
import matplotlib.pyplot as plt
#generate data
x = np.random.rand(1000)
y =np.sin( x*25)+0.1*np.random.rand(1000)
# Make some as NAN
y[np.random.choice(np.arange(1000), size=100)]= np.nan
x[np.random.choice(np.arange(1000), size=80)]= np.nan
# fit model
model = sm.OLS(y, sm.add_constant(x) ,missing='drop').fit()
print model.summary()
# plot
plt.scatter(model.model.exog[:,1], model.resid)
plt.show()