statsmodel 的线性回归问题
linear regression problems with statsmodel
我有一个 pandas df 看起来像这样:
broker-value-current broker-value-prior consensus-after
590.00 510.00 462.55
32.74 31.98 30.72
33.00 30.00 30.04
pctch_broker pctch_consensus pctch_frstrec_eps
15.686275 1.599051 1.421657
2.376485 0.195695 -82.098455
10.000000 0.805369 -82.098455
pctch_frstrec_rev
1.243782
-1.258936
-1.258936
最后几列的创建位置:
data['pctch_broker'] = ((data['broker-value-current']-data['broker-value-prior'])/data['broker-value-prior'])*100
data['pctch_consensus'] = ((data['consensus-after']-data['consensus-before'])/data['consensus-before'])*100
data['pctch_frstrec_eps'] = ((data['frstrec_eps_announced']-data['frstrec_eps_forecast'])/data['frstrec_eps_forecast'])*100
data['pctch_frstrec_rev'] = ((data['frstrec_rev_announced']-data['frstrec_rev_forecast'])/data['frstrec_rev_forecast'])*100
我也用这一行清除了 NA:
cleaned_data = data.dropna()
使用 scipy 统计数据时:
import statsmodels.formula.api as sm
然而,当我尝试回归 'pctch_consensus' 或 'pctch_broker' 作为自变量,使用 'pctch_frstrec_rev' 或 'pctch_frstrec_eps' 作为因变量时:
reg1 = sm.ols(formula="pctch_consensus ~ pctch_frstrec_rev", data=cleaned_data).fit()
我收到此错误:
RuntimeWarning: invalid value encountered in greater return (S > tol).sum(axis=-1)
出现此问题是因为您的数据框中存在无穷大。在创建新变量时,您可能通过除以零来创建这些无穷大。
这应该可以解决问题:
cleaned_data = data.replace([np.inf, -np.inf], np.nan)
我有一个 pandas df 看起来像这样:
broker-value-current broker-value-prior consensus-after
590.00 510.00 462.55
32.74 31.98 30.72
33.00 30.00 30.04
pctch_broker pctch_consensus pctch_frstrec_eps
15.686275 1.599051 1.421657
2.376485 0.195695 -82.098455
10.000000 0.805369 -82.098455
pctch_frstrec_rev
1.243782
-1.258936
-1.258936
最后几列的创建位置:
data['pctch_broker'] = ((data['broker-value-current']-data['broker-value-prior'])/data['broker-value-prior'])*100
data['pctch_consensus'] = ((data['consensus-after']-data['consensus-before'])/data['consensus-before'])*100
data['pctch_frstrec_eps'] = ((data['frstrec_eps_announced']-data['frstrec_eps_forecast'])/data['frstrec_eps_forecast'])*100
data['pctch_frstrec_rev'] = ((data['frstrec_rev_announced']-data['frstrec_rev_forecast'])/data['frstrec_rev_forecast'])*100
我也用这一行清除了 NA:
cleaned_data = data.dropna()
使用 scipy 统计数据时:
import statsmodels.formula.api as sm
然而,当我尝试回归 'pctch_consensus' 或 'pctch_broker' 作为自变量,使用 'pctch_frstrec_rev' 或 'pctch_frstrec_eps' 作为因变量时:
reg1 = sm.ols(formula="pctch_consensus ~ pctch_frstrec_rev", data=cleaned_data).fit()
我收到此错误:
RuntimeWarning: invalid value encountered in greater return (S > tol).sum(axis=-1)
出现此问题是因为您的数据框中存在无穷大。在创建新变量时,您可能通过除以零来创建这些无穷大。
这应该可以解决问题:
cleaned_data = data.replace([np.inf, -np.inf], np.nan)