如何翻转明确定义的协方差矩阵?

How to roll-over an explicitly defined covariance matrix?

我定义了一个加权 COVAR 矩阵。现在我正试图随着时间的推移滚动它。 也就是说,我想获得一个滚动 window 为 60 的加权 COVAR 矩阵。 作为例子,我将以人口协方差矩阵:

def cm(data):
    data = data.values
    row_data = data.shape[0]
    col_data = data.shape[1]

    cov_mat = np.zeros([col_data, col_data])

    for i in range(0, col_data):
        for j in range(0, col_data):
            mean_1 = np.mean(data[:,i])
            mean_2 = np.mean(data[:,j])
            total = 0

            for k in range(0, row_data):
               total = total + (data[k][i]-mean_1)*(data[k][j]-mean_2)

            cov_mat[i][j] = total * (1/row_data)

    return cov_mat

对于这种特殊情况,我怎样才能有效地翻转矩阵?

更新:

经过反复试验,我通过包含一个在滚动周期内迭代的 for 循环设法解决了我自己的部分问题:

在:

rolling_window = 60

def cm(data):
     data = data.values    
     row_data = data.shape[0]
     col_data = data.shape[1]

     # Define the number of rolls that have to be made: 
     rolls = row_data - rolling_window

     # Define an empty list which will be filled with COV/VAR matrices:
     cov_mat_main = []

     for t in range(rolls):
         cov_mat = np.zeros([col_data, col_data])

         for i in range(0, col_data):
             for j in range(0, col_data):
                 mean_1 = np.mean(data[t:rolling_window+t,i])
                 mean_2 = np.mean(data[t:rolling_window+t:,j])

                 total = 0
                 for k in range(t, rolling_window+t):
                     total = total + (data[k][i]-mean_1)*(data[k][j]-mean_2)

                 cov_mat[i][j] = total * (1/row_data)

         cov_mat_main.append(cov_mat)

     cov_mat_main = np.array(cov_mat_main)

cm(df)

输出:

[[ 5.81310317e-07 -1.37889464e-06 -3.57360335e-07]
  [-1.37889464e-06  8.73264313e-06  6.19930936e-06]
  [-3.57360335e-07  6.19930936e-06  9.02566589e-06]]

 [[ 4.03349133e-07 -1.31881055e-06 -6.03769261e-07]
  [-1.31881055e-06  8.76683970e-06  6.26991034e-06]
  [-6.03769261e-07  6.26991034e-06  8.68739335e-06]]]

但是,这个函数的输出似乎与内置函数的输出不一致。

在:

cm = df.rolling(rolling_window).cov()

输出:

     [[ 4.50638342e-06 -1.47342972e-05 -6.74556002e-06]
  [-1.47342972e-05  9.79467608e-05  7.00500328e-05]
  [-6.74556002e-06  7.00500328e-05  9.70591532e-05]]

 [[ 3.41189600e-06 -9.47500359e-06 -4.76181287e-06]
  [-9.47500359e-06  7.50918104e-05  5.93125976e-05]
  [-4.76181287e-06  5.93125976e-05  9.40643303e-05]]]

数据框中没有缺失值,这可以解释与 .cov() 矩阵相比定义矩阵中的潜在偏差。

希望有人能指出错误。

有什么建议吗?

经过反复试验,我设法解决了自己的问题。

任何对解决方案感兴趣的人:

rolling_window = 30

def cm(data):
    data = data.values
    row_data = data.shape[0]
    col_data = data.shape[1]

    # Specifying the amount of rolls that have to be taken / the amount of VAR/COV matrices that have to be calculated
    rolls = row_data - rolling_window

    # Creating an empty list which will be appened a VAR/COV matrices for every roll. 
    cov_mat_main = []

    for t in range(rolls):
       cov_mat = np.zeros([col_data, col_data])
       begin_est = t+1
       end_est = rolling_window+t+1

           for i in range(0, col_data):
               for j in range(0, col_data):
                   mean_1 = np.mean(data[begin_est:end_est,i])
                   mean_2 = np.mean(data[begin_est:end_est,j])
                   total = 0

                   for k in range(begin_est, end_est):
                       total = total + (data[k][i]-mean_1)*(data[k][j]-mean_2)
                   cov_mat[i][j] = total * (1/(rolling_window-1))

          cov_mat_main.append(cov_mat)

     cov_mat_main = np.array(cov_mat_main)

     return cov_mat_main

print(cm(df))

看来我必须考虑到:

  • 自由度
  • 'total' 除以 rolling_window 而不是 row_data
  • 在估计的开始和结束时添加 1 个时间单位 window

使其与 .cov() 函数对齐。

这个定义的矩阵结果,out:

 [[ 4.50638342e-06 -1.47342972e-05 -6.74556002e-06]
  [-1.47342972e-05  9.79467608e-05  7.00500328e-05]
  [-6.74556002e-06  7.00500328e-05  9.70591532e-05]]

 [[ 3.41189600e-06 -9.47500359e-06 -4.76181287e-06]
  [-9.47500359e-06  7.50918104e-05  5.93125976e-05]
  [-4.76181287e-06  5.93125976e-05  9.40643303e-05]]]

df.rolling(rolling_window).cov()一致:

 [[ 4.50638342e-06 -1.47342972e-05 -6.74556002e-06]
  [-1.47342972e-05  9.79467608e-05  7.00500328e-05]
  [-6.74556002e-06  7.00500328e-05  9.70591532e-05]]

 [[ 3.41189600e-06 -9.47500359e-06 -4.76181287e-06]
  [-9.47500359e-06  7.50918104e-05  5.93125976e-05]
  [-4.76181287e-06  5.93125976e-05  9.40643303e-05]]]