statsmodel 的线性回归问题

linear regression problems with statsmodel

我有一个 pandas df 看起来像这样:

   broker-value-current  broker-value-prior      consensus-after  
                 590.00              510.00              462.55   
                  32.74               31.98               30.72   
                  33.00               30.00               30.04 

           pctch_broker      pctch_consensus    pctch_frstrec_eps 
              15.686275             1.599051             1.421657   
               2.376485             0.195695           -82.098455   
              10.000000             0.805369           -82.098455  

      pctch_frstrec_rev  
               1.243782  
              -1.258936  
              -1.258936 

最后几列的创建位置:

 data['pctch_broker'] = ((data['broker-value-current']-data['broker-value-prior'])/data['broker-value-prior'])*100
 data['pctch_consensus'] = ((data['consensus-after']-data['consensus-before'])/data['consensus-before'])*100
 data['pctch_frstrec_eps'] = ((data['frstrec_eps_announced']-data['frstrec_eps_forecast'])/data['frstrec_eps_forecast'])*100
 data['pctch_frstrec_rev'] = ((data['frstrec_rev_announced']-data['frstrec_rev_forecast'])/data['frstrec_rev_forecast'])*100

我也用这一行清除了 NA:

cleaned_data = data.dropna()

使用 scipy 统计数据时:

 import statsmodels.formula.api as sm

然而,当我尝试回归 'pctch_consensus' 或 'pctch_broker' 作为自变量,使用 'pctch_frstrec_rev' 或 'pctch_frstrec_eps' 作为因变量时:

 reg1 = sm.ols(formula="pctch_consensus ~ pctch_frstrec_rev", data=cleaned_data).fit()

我收到此错误:

RuntimeWarning: invalid value encountered in greater return (S > tol).sum(axis=-1)

出现此问题是因为您的数据框中存在无穷大。在创建新变量时,您可能通过除以零来创建这些无穷大。

这应该可以解决问题:

cleaned_data = data.replace([np.inf, -np.inf], np.nan)