如何翻转明确定义的协方差矩阵?
How to roll-over an explicitly defined covariance matrix?
我定义了一个加权 COVAR 矩阵。现在我正试图随着时间的推移滚动它。
也就是说,我想获得一个滚动 window 为 60 的加权 COVAR 矩阵。
作为例子,我将以人口协方差矩阵:
def cm(data):
data = data.values
row_data = data.shape[0]
col_data = data.shape[1]
cov_mat = np.zeros([col_data, col_data])
for i in range(0, col_data):
for j in range(0, col_data):
mean_1 = np.mean(data[:,i])
mean_2 = np.mean(data[:,j])
total = 0
for k in range(0, row_data):
total = total + (data[k][i]-mean_1)*(data[k][j]-mean_2)
cov_mat[i][j] = total * (1/row_data)
return cov_mat
对于这种特殊情况,我怎样才能有效地翻转矩阵?
更新:
经过反复试验,我通过包含一个在滚动周期内迭代的 for 循环设法解决了我自己的部分问题:
在:
rolling_window = 60
def cm(data):
data = data.values
row_data = data.shape[0]
col_data = data.shape[1]
# Define the number of rolls that have to be made:
rolls = row_data - rolling_window
# Define an empty list which will be filled with COV/VAR matrices:
cov_mat_main = []
for t in range(rolls):
cov_mat = np.zeros([col_data, col_data])
for i in range(0, col_data):
for j in range(0, col_data):
mean_1 = np.mean(data[t:rolling_window+t,i])
mean_2 = np.mean(data[t:rolling_window+t:,j])
total = 0
for k in range(t, rolling_window+t):
total = total + (data[k][i]-mean_1)*(data[k][j]-mean_2)
cov_mat[i][j] = total * (1/row_data)
cov_mat_main.append(cov_mat)
cov_mat_main = np.array(cov_mat_main)
cm(df)
输出:
[[ 5.81310317e-07 -1.37889464e-06 -3.57360335e-07]
[-1.37889464e-06 8.73264313e-06 6.19930936e-06]
[-3.57360335e-07 6.19930936e-06 9.02566589e-06]]
[[ 4.03349133e-07 -1.31881055e-06 -6.03769261e-07]
[-1.31881055e-06 8.76683970e-06 6.26991034e-06]
[-6.03769261e-07 6.26991034e-06 8.68739335e-06]]]
但是,这个函数的输出似乎与内置函数的输出不一致。
在:
cm = df.rolling(rolling_window).cov()
输出:
[[ 4.50638342e-06 -1.47342972e-05 -6.74556002e-06]
[-1.47342972e-05 9.79467608e-05 7.00500328e-05]
[-6.74556002e-06 7.00500328e-05 9.70591532e-05]]
[[ 3.41189600e-06 -9.47500359e-06 -4.76181287e-06]
[-9.47500359e-06 7.50918104e-05 5.93125976e-05]
[-4.76181287e-06 5.93125976e-05 9.40643303e-05]]]
数据框中没有缺失值,这可以解释与 .cov()
矩阵相比定义矩阵中的潜在偏差。
希望有人能指出错误。
有什么建议吗?
经过反复试验,我设法解决了自己的问题。
任何对解决方案感兴趣的人:
rolling_window = 30
def cm(data):
data = data.values
row_data = data.shape[0]
col_data = data.shape[1]
# Specifying the amount of rolls that have to be taken / the amount of VAR/COV matrices that have to be calculated
rolls = row_data - rolling_window
# Creating an empty list which will be appened a VAR/COV matrices for every roll.
cov_mat_main = []
for t in range(rolls):
cov_mat = np.zeros([col_data, col_data])
begin_est = t+1
end_est = rolling_window+t+1
for i in range(0, col_data):
for j in range(0, col_data):
mean_1 = np.mean(data[begin_est:end_est,i])
mean_2 = np.mean(data[begin_est:end_est,j])
total = 0
for k in range(begin_est, end_est):
total = total + (data[k][i]-mean_1)*(data[k][j]-mean_2)
cov_mat[i][j] = total * (1/(rolling_window-1))
cov_mat_main.append(cov_mat)
cov_mat_main = np.array(cov_mat_main)
return cov_mat_main
print(cm(df))
看来我必须考虑到:
- 自由度
- 'total' 除以 rolling_window 而不是 row_data
- 在估计的开始和结束时添加 1 个时间单位 window
使其与 .cov()
函数对齐。
这个定义的矩阵结果,out:
[[ 4.50638342e-06 -1.47342972e-05 -6.74556002e-06]
[-1.47342972e-05 9.79467608e-05 7.00500328e-05]
[-6.74556002e-06 7.00500328e-05 9.70591532e-05]]
[[ 3.41189600e-06 -9.47500359e-06 -4.76181287e-06]
[-9.47500359e-06 7.50918104e-05 5.93125976e-05]
[-4.76181287e-06 5.93125976e-05 9.40643303e-05]]]
与df.rolling(rolling_window).cov()
一致:
[[ 4.50638342e-06 -1.47342972e-05 -6.74556002e-06]
[-1.47342972e-05 9.79467608e-05 7.00500328e-05]
[-6.74556002e-06 7.00500328e-05 9.70591532e-05]]
[[ 3.41189600e-06 -9.47500359e-06 -4.76181287e-06]
[-9.47500359e-06 7.50918104e-05 5.93125976e-05]
[-4.76181287e-06 5.93125976e-05 9.40643303e-05]]]
我定义了一个加权 COVAR 矩阵。现在我正试图随着时间的推移滚动它。 也就是说,我想获得一个滚动 window 为 60 的加权 COVAR 矩阵。 作为例子,我将以人口协方差矩阵:
def cm(data):
data = data.values
row_data = data.shape[0]
col_data = data.shape[1]
cov_mat = np.zeros([col_data, col_data])
for i in range(0, col_data):
for j in range(0, col_data):
mean_1 = np.mean(data[:,i])
mean_2 = np.mean(data[:,j])
total = 0
for k in range(0, row_data):
total = total + (data[k][i]-mean_1)*(data[k][j]-mean_2)
cov_mat[i][j] = total * (1/row_data)
return cov_mat
对于这种特殊情况,我怎样才能有效地翻转矩阵?
更新:
经过反复试验,我通过包含一个在滚动周期内迭代的 for 循环设法解决了我自己的部分问题:
在:
rolling_window = 60
def cm(data):
data = data.values
row_data = data.shape[0]
col_data = data.shape[1]
# Define the number of rolls that have to be made:
rolls = row_data - rolling_window
# Define an empty list which will be filled with COV/VAR matrices:
cov_mat_main = []
for t in range(rolls):
cov_mat = np.zeros([col_data, col_data])
for i in range(0, col_data):
for j in range(0, col_data):
mean_1 = np.mean(data[t:rolling_window+t,i])
mean_2 = np.mean(data[t:rolling_window+t:,j])
total = 0
for k in range(t, rolling_window+t):
total = total + (data[k][i]-mean_1)*(data[k][j]-mean_2)
cov_mat[i][j] = total * (1/row_data)
cov_mat_main.append(cov_mat)
cov_mat_main = np.array(cov_mat_main)
cm(df)
输出:
[[ 5.81310317e-07 -1.37889464e-06 -3.57360335e-07]
[-1.37889464e-06 8.73264313e-06 6.19930936e-06]
[-3.57360335e-07 6.19930936e-06 9.02566589e-06]]
[[ 4.03349133e-07 -1.31881055e-06 -6.03769261e-07]
[-1.31881055e-06 8.76683970e-06 6.26991034e-06]
[-6.03769261e-07 6.26991034e-06 8.68739335e-06]]]
但是,这个函数的输出似乎与内置函数的输出不一致。
在:
cm = df.rolling(rolling_window).cov()
输出:
[[ 4.50638342e-06 -1.47342972e-05 -6.74556002e-06]
[-1.47342972e-05 9.79467608e-05 7.00500328e-05]
[-6.74556002e-06 7.00500328e-05 9.70591532e-05]]
[[ 3.41189600e-06 -9.47500359e-06 -4.76181287e-06]
[-9.47500359e-06 7.50918104e-05 5.93125976e-05]
[-4.76181287e-06 5.93125976e-05 9.40643303e-05]]]
数据框中没有缺失值,这可以解释与 .cov()
矩阵相比定义矩阵中的潜在偏差。
希望有人能指出错误。
有什么建议吗?
经过反复试验,我设法解决了自己的问题。
任何对解决方案感兴趣的人:
rolling_window = 30
def cm(data):
data = data.values
row_data = data.shape[0]
col_data = data.shape[1]
# Specifying the amount of rolls that have to be taken / the amount of VAR/COV matrices that have to be calculated
rolls = row_data - rolling_window
# Creating an empty list which will be appened a VAR/COV matrices for every roll.
cov_mat_main = []
for t in range(rolls):
cov_mat = np.zeros([col_data, col_data])
begin_est = t+1
end_est = rolling_window+t+1
for i in range(0, col_data):
for j in range(0, col_data):
mean_1 = np.mean(data[begin_est:end_est,i])
mean_2 = np.mean(data[begin_est:end_est,j])
total = 0
for k in range(begin_est, end_est):
total = total + (data[k][i]-mean_1)*(data[k][j]-mean_2)
cov_mat[i][j] = total * (1/(rolling_window-1))
cov_mat_main.append(cov_mat)
cov_mat_main = np.array(cov_mat_main)
return cov_mat_main
print(cm(df))
看来我必须考虑到:
- 自由度
- 'total' 除以 rolling_window 而不是 row_data
- 在估计的开始和结束时添加 1 个时间单位 window
使其与 .cov()
函数对齐。
这个定义的矩阵结果,out:
[[ 4.50638342e-06 -1.47342972e-05 -6.74556002e-06]
[-1.47342972e-05 9.79467608e-05 7.00500328e-05]
[-6.74556002e-06 7.00500328e-05 9.70591532e-05]]
[[ 3.41189600e-06 -9.47500359e-06 -4.76181287e-06]
[-9.47500359e-06 7.50918104e-05 5.93125976e-05]
[-4.76181287e-06 5.93125976e-05 9.40643303e-05]]]
与df.rolling(rolling_window).cov()
一致:
[[ 4.50638342e-06 -1.47342972e-05 -6.74556002e-06]
[-1.47342972e-05 9.79467608e-05 7.00500328e-05]
[-6.74556002e-06 7.00500328e-05 9.70591532e-05]]
[[ 3.41189600e-06 -9.47500359e-06 -4.76181287e-06]
[-9.47500359e-06 7.50918104e-05 5.93125976e-05]
[-4.76181287e-06 5.93125976e-05 9.40643303e-05]]]