Python Statsmodels QuantReg 拦截
Python Statsmodels QuantReg Intercept
问题设置
在 statsmodels Quantile Regression 问题中,他们的最小绝对偏差摘要输出显示截距。在那个例子中,他们使用的是公式
from __future__ import print_function
import patsy
import numpy as np
import pandas as pd
import statsmodels.api as sm
import statsmodels.formula.api as smf
import matplotlib.pyplot as plt
from statsmodels.regression.quantile_regression import QuantReg
data = sm.datasets.engel.load_pandas().data
mod = smf.quantreg('foodexp ~ income', data)
res = mod.fit(q=.5)
print(res.summary())
QuantReg Regression Results
==============================================================================
Dep. Variable: foodexp Pseudo R-squared: 0.6206
Model: QuantReg Bandwidth: 64.51
Method: Least Squares Sparsity: 209.3
Date: Fri, 09 Oct 2015 No. Observations: 235
Time: 15:44:23 Df Residuals: 233
Df Model: 1
==============================================================================
coef std err t P>|t| [95.0% Conf. Int.]
------------------------------------------------------------------------------
Intercept 81.4823 14.634 5.568 0.000 52.649 110.315
income 0.5602 0.013 42.516 0.000 0.534 0.586
==============================================================================
The condition number is large, 2.38e+03. This might indicate that there are
strong multicollinearity or other numerical problems.
问题
如何使用 Intercept
而不使用 statsmodels.formula.api as smf
公式方法实现摘要输出?
当然,当我把这个问题放在一起时,我想通了。我不会删除它,而是会分享以防万一有人遇到这个问题。
正如我所怀疑的那样,我需要 add_constant() 但我不确定如何做。我在做一些愚蠢的事情并将常量添加到 Y (endog) 变量而不是 X (exog) 变量。
答案
from __future__ import print_function
import patsy
import numpy as np
import pandas as pd
import statsmodels.api as sm
import matplotlib.pyplot as plt
from statsmodels.regression.quantile_regression import QuantReg
data = sm.datasets.engel.load_pandas().data
data = sm.add_constant(data)
mod = QuantReg(data['foodexp'], data[['const', 'income']])
res = mod.fit(q=.5)
print(res.summary())
QuantReg Regression Results
==============================================================================
Dep. Variable: foodexp Pseudo R-squared: 0.6206
Model: QuantReg Bandwidth: 64.51
Method: Least Squares Sparsity: 209.3
Date: Fri, 09 Oct 2015 No. Observations: 235
Time: 22:24:47 Df Residuals: 233
Df Model: 1
==============================================================================
coef std err t P>|t| [95.0% Conf. Int.]
------------------------------------------------------------------------------
const 81.4823 14.634 5.568 0.000 52.649 110.315
income 0.5602 0.013 42.516 0.000 0.534 0.586
==============================================================================
The condition number is large, 2.38e+03. This might indicate that there are
strong multicollinearity or other numerical problems.
仅供参考,我发现有趣的是 add_constant()
只是向您的数据添加了一列 1
。有关 add_constant()
的更多信息可以是 found here。
问题设置 在 statsmodels Quantile Regression 问题中,他们的最小绝对偏差摘要输出显示截距。在那个例子中,他们使用的是公式
from __future__ import print_function
import patsy
import numpy as np
import pandas as pd
import statsmodels.api as sm
import statsmodels.formula.api as smf
import matplotlib.pyplot as plt
from statsmodels.regression.quantile_regression import QuantReg
data = sm.datasets.engel.load_pandas().data
mod = smf.quantreg('foodexp ~ income', data)
res = mod.fit(q=.5)
print(res.summary())
QuantReg Regression Results
==============================================================================
Dep. Variable: foodexp Pseudo R-squared: 0.6206
Model: QuantReg Bandwidth: 64.51
Method: Least Squares Sparsity: 209.3
Date: Fri, 09 Oct 2015 No. Observations: 235
Time: 15:44:23 Df Residuals: 233
Df Model: 1
==============================================================================
coef std err t P>|t| [95.0% Conf. Int.]
------------------------------------------------------------------------------
Intercept 81.4823 14.634 5.568 0.000 52.649 110.315
income 0.5602 0.013 42.516 0.000 0.534 0.586
==============================================================================
The condition number is large, 2.38e+03. This might indicate that there are
strong multicollinearity or other numerical problems.
问题
如何使用 Intercept
而不使用 statsmodels.formula.api as smf
公式方法实现摘要输出?
当然,当我把这个问题放在一起时,我想通了。我不会删除它,而是会分享以防万一有人遇到这个问题。
正如我所怀疑的那样,我需要 add_constant() 但我不确定如何做。我在做一些愚蠢的事情并将常量添加到 Y (endog) 变量而不是 X (exog) 变量。
答案
from __future__ import print_function
import patsy
import numpy as np
import pandas as pd
import statsmodels.api as sm
import matplotlib.pyplot as plt
from statsmodels.regression.quantile_regression import QuantReg
data = sm.datasets.engel.load_pandas().data
data = sm.add_constant(data)
mod = QuantReg(data['foodexp'], data[['const', 'income']])
res = mod.fit(q=.5)
print(res.summary())
QuantReg Regression Results
==============================================================================
Dep. Variable: foodexp Pseudo R-squared: 0.6206
Model: QuantReg Bandwidth: 64.51
Method: Least Squares Sparsity: 209.3
Date: Fri, 09 Oct 2015 No. Observations: 235
Time: 22:24:47 Df Residuals: 233
Df Model: 1
==============================================================================
coef std err t P>|t| [95.0% Conf. Int.]
------------------------------------------------------------------------------
const 81.4823 14.634 5.568 0.000 52.649 110.315
income 0.5602 0.013 42.516 0.000 0.534 0.586
==============================================================================
The condition number is large, 2.38e+03. This might indicate that there are
strong multicollinearity or other numerical problems.
仅供参考,我发现有趣的是 add_constant()
只是向您的数据添加了一列 1
。有关 add_constant()
的更多信息可以是 found here。