使用“statsmodels”绘制屏蔽值的残差

Plotting residuals of masked values with `statsmodels`

我正在使用 statsmodels.api 计算两个变量之间 OLS 拟合的统计参数:

def computeStats(x, y, yName):
    '''
    Takes as an argument an array, and a string for the array name.
    Uses Ordinary Least Squares to compute the statistical parameters for the
    array against log(z), and determines the equation for the line of best fit.
    Returns the results summary, residuals, statistical parameters in a list, and the 
    best fit equation.
    '''

    #   Mask NaN values in both axes
    mask = ~np.isnan(y) & ~np.isnan(x)
    #   Compute model parameters
    model = sm.OLS(y, sm.add_constant(x), missing= 'drop')
    results = model.fit()
    residuals = results.resid

    #   Compute fit parameters
    params = stats.linregress(x[mask], y[mask])
    fit = params[0]*x + params[1]
    fitEquation = '$(%s)=(%.4g \pm %.4g) \times redshift+%.4g$'%(yName,
                    params[0],  #   slope
                    params[4],  #   stderr in slope
                    params[1])  #   y-intercept
    return results, residuals, params, fit, fitEquation

函数的第二部分(使用 stats.linregress)可以很好地处理屏蔽值,但 statsmodels 不能。当我尝试使用 plt.scatter(x, resids) 绘制针对 x 值的残差时,尺寸不匹配:

ValueError: x and y must be the same size

因为有 29007 个 x 值和 11763 个残差(这是通过屏蔽过程获得的 y 值的数量)。我尝试将 model 变量更改为

model = sm.OLS(y[mask], sm.add_constant(x[mask]), missing= 'drop')

但这没有效果。

如何根据它们匹配的 x 值散点图绘制残差?

您好@jim421616 由于 statsmodels 丢失了一些缺失值,您应该使用模型的 exog 变量绘制散点图,如图所示。

plt.scatter(model.model.exog[:,1], model.resid)

参考一个完整的虚拟示例

import numpy as np
import statsmodels.api as sm
import matplotlib.pyplot as plt

#generate data
x = np.random.rand(1000)
y =np.sin( x*25)+0.1*np.random.rand(1000)

# Make some as NAN
y[np.random.choice(np.arange(1000), size=100)]= np.nan
x[np.random.choice(np.arange(1000), size=80)]= np.nan


# fit model
model = sm.OLS(y, sm.add_constant(x) ,missing='drop').fit()
print model.summary()

# plot 
plt.scatter(model.model.exog[:,1], model.resid)
plt.show()