将相关性和波动性数据框与多指标相乘以获得协方差矩阵

Multiply Correlation and Volatility Dataframes with Multi-Index to Get Covariance Matrix

我将波动率数据帧 (rvm) 与相关性数据帧 (omega_tilde) 相乘以获得协方差矩阵。

rvm DataFrame(5790行×10列):

                     NoDur  Durbl   Manuf   Enrgy   HiTec   Telcm   Shops   Hlth    Utils   Other
Date          lvl1                              
1972-11-30    NoDur  0.006660       0       0       0       0       0       0       0       0
              Durbl  0      0.00939 0       0       0       0       0       0       0       0
              Manuf  0      0       0.00803 0       0       0       0       0       0       0
              Enrgy  0      0       0       0.00851 0       0       0       0       0       0
              HiTec  0      0       0       0       0.01205 0       0       0       0       0
              Telcm  0      0       0       0       0       0.00799 0       0       0       0
              Shops  0      0       0       0       0       0       0.00795 0       0       0
              Hlth   0      0       0       0       0       0       0       0.00819 0       0
              Utils  0      0       0       0       0       0       0       0       0.00505 0
              Other  0      0       0       0       0       0       0       0       0       0.00892
1972-11-31    NoDur  0.006640       0       0       0       0       0       0       0       0
              Durbl  0      0.00943 0       0       0       0       0       0       0       0
              Manuf  0      0       0.00800 0       0       0       0       0       0       0
              Enrgy  0      0       0       0.00837 0       0       0       0       0       0
              HiTec  0      0       0       0       0.01185 0       0       0       0       0
              Telcm  0      0       0       0       0       0.00792 0       0       0       0
              Shops  0      0       0       0       0       0       0.00794 0       0       0
              Hlth   0      0       0       0       0       0       0       0.00804 0       0
              Utils  0      0       0       0       0       0       0       0       0.00504 0
              Other  0      0       0       0       0       0       0       0       0       0.00889

         

omega_tildeDataFrame(5790行×10列):

                     NoDur   Durbl   Manuf   Enrgy   HiTec   Telcm   Shops   Hlth    Utils   Other
Date        level_1                                     
2021-01-31  NoDur    1.00000 0.62369 0.87367 0.65322 0.74356 0.84011 0.77417 0.80183 0.82833 0.84094
            Durbl    0.62369 1.00000 0.69965 0.57501 0.70125 0.60104 0.68652 0.61333 0.45301 0.70556
            Manuf    0.87367 0.69965 1.00000 0.78599 0.81415 0.84477 0.80932 0.82127 0.74803 0.94673
            Enrgy    0.65322 0.57501 0.78599 1.00000 0.59940 0.67492 0.58058 0.61946 0.57830 0.81593
            HiTec    0.74356 0.70125 0.81415 0.59940 1.00000 0.75436 0.91318 0.84508 0.59302 0.81109
            Telcm    0.84011 0.60104 0.84477 0.67492 0.75436 1.00000 0.77555 0.77342 0.73186 0.85595
            Shops    0.77417 0.68652 0.80932 0.58058 0.91318 0.77555 1.00000 0.81197 0.61574 0.79932
            Hlth     0.80183 0.61333 0.82127 0.61946 0.84508 0.77342 0.81197 1.00000 0.70032 0.80875
            Utils    0.82833 0.45301 0.74803 0.57830 0.59302 0.73186 0.61574 0.70032 1.00000 0.72739
            Other    0.84094 0.70556 0.94673 0.81593 0.81109 0.85595 0.79932 0.80875 0.72739 1.00000
2021-02-28  NoDur    1.00000 0.61544 0.87041 0.64622 0.73941 0.83792 0.77075 0.79993 0.82813 0.83937
            Durbl    0.61544 1.00000 0.69464 0.55865 0.70203 0.59109 0.68265 0.60963 0.44792 0.69685 
            Manuf    0.87041 0.69464 1.00000 0.78243 0.81121 0.84189 0.80395 0.81809 0.74489 0.94605
            Enrgy    0.64622 0.55865 0.78243 1.00000 0.58911 0.67134 0.56925 0.61252 0.56865 0.81365
            HiTec    0.73941 0.70203 0.81121 0.58911 1.00000 0.74904 0.91274 0.84179 0.58973 0.80581
            Telcm    0.83792 0.59109 0.84189 0.67134 0.74904 1.00000 0.77078 0.76844 0.72814 0.85493
            Shops    0.77075 0.68265 0.80395 0.56925 0.91274 0.77078 1.00000 0.80924 0.61446 0.79342
            Hlth     0.79993 0.60963 0.81809 0.61252 0.84179 0.76844 0.80924 1.00000 0.69965 0.80394
            Utils    0.82813 0.44792 0.74489 0.56865 0.58973 0.72814 0.61446 0.69965 1.00000 0.72542
            Other    0.83937 0.69685 0.94605 0.81365 0.80581 0.85493 0.79342 0.80394 0.72542 1.00000

我试过的代码:

sigma_tilde = omega_tilde.groupby(level='Date').apply(lambda g: rvm_diag.loc[g.name].dot(g.values@(rvm_diag.loc[g.name])))

我得到的错误:

ValueError: matrices are not aligned

编辑: 我还尝试了以下方法:

 reshaped = omega_tilde.values.reshape(omega_tilde.index.levels[0].nunique(), omega_tilde.index.levels[1].nunique(), omega_tilde.shape[-1])
 np.einsum('ijk,ik->ijk', rvm_diag.values, np.einsum('ijk,ik->ij', reshaped, rvm_diag.values))

这里的错误:

 ValueError: operands could not be broadcast together with remapped shapes [original->remapped]: (579,10,10)->(579,10,10) (5790,10)->(5790,newaxis,10) 

我想要的输出与 omega_tilde DataFrame 的格式相同,所以每天都有一个矩阵。

感谢任何帮助。谢谢!

为您提供 ValueError: matrices are not aligned 的代码只需添加 .values 即可使矩阵乘法正常工作,您可以对两个乘法步骤使用 @,以便返回 DataFrame。

sigma_tilde = (
    omega_tilde
    .groupby(level='Date')
    .apply(lambda g: rvm.loc[g.name].values@(g.values@(rvm.loc[g.name]))
)

# additional step to change the second level of index
sigma_tilde.index.set_levels(omega_tilde.columns, 1, inplace=True)

在一个较小的示例中(上面的 DF 的左上 3x3 象限,但两个月的值相同并且两个 DF 使用相同的两个月):

omega_tilde = pd.DataFrame(
    np.array(
        [[1.00000, 0.62369, 0.87367],
         [0.62369, 1.00000, 0.69965],
         [0.87367, 0.69965, 1.00000],
         [1.00000, 0.62369, 0.87367],
         [0.62369, 1.00000, 0.69965],
         [0.87367, 0.69965, 1.00000]]
    ),
    index = pd.MultiIndex.from_arrays(
        [[pd.Timestamp('2021-01-31'), pd.Timestamp('2021-01-31'),
          pd.Timestamp('2021-01-31'), pd.Timestamp("2021-02-28"),
          pd.Timestamp("2021-02-28"), pd.Timestamp("2021-02-28")],
         ['NoDur', 'Durbl', 'Manuf']*2],
         names=['Date', 'level_1']
    ),
    columns = ['NoDur', 'Durbl', 'Manuf']
)

rvm = pd.DataFrame(
    np.array(
        [[0.00666, 0, 0],
         [0, 0.00939, 0],
         [0, 0, 0.00803],
         [0.00666, 0, 0],
         [0, 0.00939, 0],
         [0, 0, 0.00803]]
    ),
    index = pd.MultiIndex.from_arrays(
        [[pd.Timestamp('2021-01-31'), pd.Timestamp('2021-01-31'),
          pd.Timestamp('2021-01-31'), pd.Timestamp("2021-02-28"),
          pd.Timestamp("2021-02-28"), pd.Timestamp("2021-02-28")],
         ['NoDur', 'Durbl', 'Manuf']*2],
         names=['Date', 'level_1']
    ),
    columns = ['NoDur', 'Durbl', 'Manuf']
)

乘法代码将产生:

             level_1       NoDur       Durbl       Manuf
2021-01-31     NoDur    0.000044    0.000039    0.000047
               Durbl    0.000039    0.000088    0.000053
               Manuf    0.000047    0.000053    0.000064
2021-02-28     NoDur    0.000044    0.000039    0.000047
               Durbl    0.000039    0.000088    0.000053
               Manuf    0.000047    0.000053    0.000064