获取 ols.param 中的列名和系数列表

Getting list of column names and coeff in ols.param

我正在对两个数据帧使用 OLS:

gab = ols(formula= 'only_volume ~ all_but_volume', data=data_p ).fit() 

其中,

only_volume = data_p.iloc[:,0] #Only first colum
all_but_volume = data_p.iloc[:, 1:data_p.shape[1]] #All but first column

当我尝试提取某些东西时,比如参数或 pvals,我得到这样的东西:

In [3]: gab.params
Out[3]: 
Intercept             2.687598e+06
all_but_volume[0]     5.500544e+01
all_but_volume[1]     2.696902e+02
all_but_volume[2]     3.389568e+04
all_but_volume[3]    -2.385838e+04
all_but_volume[4]     5.419860e+02
all_but_volume[5]     3.815161e+02
all_but_volume[6]    -2.281344e+04
all_but_volume[7]     1.794128e+04
...
all_but_volume[22]    1.374321e+00

由于 gab.params 在 LHS 中提供了 23 个值,而 all_but_volume 有 23 列,我希望是否有办法获得 list/zip 的参数使用列名,而不是使用 all_but_volume[i]

的参数

喜欢,

TMC     9.801195e+01
TAC     2.214464e+02
...

我尝试过的: 删除 all_but_volume 并简单地使用 data_p.iloc[:, 1:data_p.shape[1]]

没有成功:

...
data_p.iloc[:, 1:data_p.shape[1]][21]    2.918531e+04
data_p.iloc[:, 1:data_p.shape[1]][22]    1.395342e+00

编辑: 示例数据:

data_p.iloc[1:5,:]
Out[31]: 
          Volume             A              B                  C\
1  569886.171878    759.089217     272.446022           4.163908   
2  561695.886128    701.165406     330.301260           4.136530   
3  627221.486089    377.746089     656.838394           4.130720   
4  625181.750625    361.489041     670.575110           4.134467   

                          D         E        F      G      H     I  \
1                  1.000842  12993.06  3371.28  236.90  4.92  6.13   
2                  0.981514  13005.44  3378.69  236.94  4.92  6.13   
3                  0.836920  13017.22  3384.47  236.98  4.93  6.13   
4                  0.810541  13028.56  3388.85  237.01  4.94  6.13   

                          J               K       L       M           N  \
1      ...                0               0       0        0          0   
2      ...                0               0       0        0          0   
3      ...                0               0       0        0          0   
4      ...                0               0       0        0          0   

           O             P     Q             R   S  
1          0             0     0             1   9202.171648  
2          0             0     0             0   4381.373520  
3          0             0     0             0 -13982.443554  
4          0             0     0             0 -22878.843149

only_volume 是第一列 'volume' all_but_volume 是除 'volume'

之外的所有列

您可以使用DataFrame构造函数或rename,因为gab.paramsSeries:

示例:

np.random.seed(2018)

import statsmodels.formula.api as sm
data_p = pd.DataFrame(np.random.rand(10, 5), columns=['Volume','A','B','C','D'])
print (data_p)
     Volume         A         B         C         D
0  0.882349  0.104328  0.907009  0.306399  0.446409
1  0.589985  0.837111  0.697801  0.802803  0.107215
2  0.757093  0.999671  0.725931  0.141448  0.356721
3  0.942704  0.610162  0.227577  0.668732  0.692905
4  0.416863  0.171810  0.976891  0.330224  0.629044
5  0.160611  0.089953  0.970822  0.816578  0.571366
6  0.345853  0.403744  0.137383  0.900934  0.933936
7  0.047377  0.671507  0.034832  0.252691  0.557125
8  0.525823  0.352968  0.092983  0.304509  0.862430
9  0.716937  0.964071  0.539702  0.950540  0.667982

only_volume = data_p.iloc[:,0] #Only first colum
all_but_volume = data_p.iloc[:, 1:data_p.shape[1]] #All but first column
gab = sm.ols(formula= 'only_volume ~ all_but_volume', data=data_p ).fit() 
print (gab.params)
Intercept            0.077570
all_but_volume[0]    0.395072
all_but_volume[1]    0.313150
all_but_volume[2]   -0.100752
all_but_volume[3]    0.247532
dtype: float64

print (type(gab.params))
<class 'pandas.core.series.Series'>

df = pd.DataFrame({'cols':data_p.columns[1:], 'par': gab.params.values[1:]})
print (df)
  cols       par
0    A  0.395072
1    B  0.313150
2    C -0.100752
3    D  0.247532

如需return Series:

s = gab.params.rename(dict(zip(gab.params.index, data_p.columns)))
print (s)
Volume    0.077570
A         0.395072
B         0.313150
C        -0.100752
D         0.247532
dtype: float64

Series 没有第一个值:

s = gab.params.iloc[1:].rename(dict(zip(gab.params.index, data_p.columns)))
print (s)

A    0.395072
B    0.313150
C   -0.100752
D    0.247532
dtype: float64