在多个日期的 Multiindex 上应用 PCA
Apply PCA on Multiindex for several dates
我正在尝试对一个多指数执行主成分分析,它在几天内给出一个相关矩阵。对于那些日子里的每一天,我都想对相关矩阵执行 PCA。
感谢任何帮助。
DataFrame:rolling_cor_monthly(6140 行 × 10 列):
NoDur Durbl Manuf Enrgy HiTec Telcm Shops Hlth Utils Other
Date level_1
2021-01-31 NoDur 1.00000 0.62369 0.87367 0.65322 0.74356 0.84011 0.77417 0.80183 0.82833 0.84094
Durbl 0.62369 1.00000 0.69965 0.57501 0.70125 0.60104 0.68652 0.61333 0.45301 0.70556
Manuf 0.87367 0.69965 1.00000 0.78599 0.81415 0.84477 0.80932 0.82127 0.74803 0.94673
Enrgy 0.65322 0.57501 0.78599 1.00000 0.59940 0.67492 0.58058 0.61946 0.57830 0.81593
HiTec 0.74356 0.70125 0.81415 0.59940 1.00000 0.75436 0.91318 0.84508 0.59302 0.81109
Telcm 0.84011 0.60104 0.84477 0.67492 0.75436 1.00000 0.77555 0.77342 0.73186 0.85595
Shops 0.77417 0.68652 0.80932 0.58058 0.91318 0.77555 1.00000 0.81197 0.61574 0.79932
Hlth 0.80183 0.61333 0.82127 0.61946 0.84508 0.77342 0.81197 1.00000 0.70032 0.80875
Utils 0.82833 0.45301 0.74803 0.57830 0.59302 0.73186 0.61574 0.70032 1.00000 0.72739
Other 0.84094 0.70556 0.94673 0.81593 0.81109 0.85595 0.79932 0.80875 0.72739 1.00000
2021-02-28 NoDur 1.00000 0.61544 0.87041 0.64622 0.73941 0.83792 0.77075 0.79993 0.82813 0.83937
Durbl 0.61544 1.00000 0.69464 0.55865 0.70203 0.59109 0.68265 0.60963 0.44792 0.69685
Manuf 0.87041 0.69464 1.00000 0.78243 0.81121 0.84189 0.80395 0.81809 0.74489 0.94605
Enrgy 0.64622 0.55865 0.78243 1.00000 0.58911 0.67134 0.56925 0.61252 0.56865 0.81365
HiTec 0.73941 0.70203 0.81121 0.58911 1.00000 0.74904 0.91274 0.84179 0.58973 0.80581
Telcm 0.83792 0.59109 0.84189 0.67134 0.74904 1.00000 0.77078 0.76844 0.72814 0.85493
Shops 0.77075 0.68265 0.80395 0.56925 0.91274 0.77078 1.00000 0.80924 0.61446 0.79342
Hlth 0.79993 0.60963 0.81809 0.61252 0.84179 0.76844 0.80924 1.00000 0.69965 0.80394
Utils 0.82813 0.44792 0.74489 0.56865 0.58973 0.72814 0.61446 0.69965 1.00000 0.72542
Other 0.83937 0.69685 0.94605 0.81365 0.80581 0.85493 0.79342 0.80394 0.72542 1.00000
我试过的代码:
eigenvalues, eigenvectors = LA.eig(rolling_cor_monthly)
idx = eigenvalues.argsort()[::-1]
D = pd.DataFrame(data = np.diag(eigenvalues[idx]))
P = pd.DataFrame(data = eigenvectors[:,idx])
错误:
LinAlgError: Last 2 dimensions of the array must be square
我希望获得的输出与数据帧的格式相同。
非常感谢!
这需要处理额外的维度,所以会涉及更多一点:
import numpy as np
import numpy.linalg as LA
import pandas as pd
# convert dataframe to a 3-d array (the new axis will correspond to date index)
arr = df.values[np.newaxis,:,:].reshape((len(df.index.levels[0]),10,10))
# get eigenvalues (n x 10) and eigenvectors (n x 10 x 10)
eigenvalues, eigenvectors = LA.eig(arr)
您的其余代码(排序和转换为数据帧)可以写成:
eigenvalues = np.sort(eigenvalues, axis=1)[:, ::-1]
# can also use this to sort:
# idx = eigenvalues.argsort()[:, ::-1]
# eigenvalues = np.take_along_axis(eigenvalues, idx, axis=1))
D = pd.DataFrame(
np.apply_along_axis(np.diag, 1, eigenvalues).reshape(-1,10),
index=df.index
)
eigenvectors = np.sort(eigenvectors, axis=1)[:, ::-1]
P = pd.DataFrame(
eigenvectors.reshape(-1,10),
index=df.index
)
我正在尝试对一个多指数执行主成分分析,它在几天内给出一个相关矩阵。对于那些日子里的每一天,我都想对相关矩阵执行 PCA。 感谢任何帮助。
DataFrame:rolling_cor_monthly(6140 行 × 10 列):
NoDur Durbl Manuf Enrgy HiTec Telcm Shops Hlth Utils Other
Date level_1
2021-01-31 NoDur 1.00000 0.62369 0.87367 0.65322 0.74356 0.84011 0.77417 0.80183 0.82833 0.84094
Durbl 0.62369 1.00000 0.69965 0.57501 0.70125 0.60104 0.68652 0.61333 0.45301 0.70556
Manuf 0.87367 0.69965 1.00000 0.78599 0.81415 0.84477 0.80932 0.82127 0.74803 0.94673
Enrgy 0.65322 0.57501 0.78599 1.00000 0.59940 0.67492 0.58058 0.61946 0.57830 0.81593
HiTec 0.74356 0.70125 0.81415 0.59940 1.00000 0.75436 0.91318 0.84508 0.59302 0.81109
Telcm 0.84011 0.60104 0.84477 0.67492 0.75436 1.00000 0.77555 0.77342 0.73186 0.85595
Shops 0.77417 0.68652 0.80932 0.58058 0.91318 0.77555 1.00000 0.81197 0.61574 0.79932
Hlth 0.80183 0.61333 0.82127 0.61946 0.84508 0.77342 0.81197 1.00000 0.70032 0.80875
Utils 0.82833 0.45301 0.74803 0.57830 0.59302 0.73186 0.61574 0.70032 1.00000 0.72739
Other 0.84094 0.70556 0.94673 0.81593 0.81109 0.85595 0.79932 0.80875 0.72739 1.00000
2021-02-28 NoDur 1.00000 0.61544 0.87041 0.64622 0.73941 0.83792 0.77075 0.79993 0.82813 0.83937
Durbl 0.61544 1.00000 0.69464 0.55865 0.70203 0.59109 0.68265 0.60963 0.44792 0.69685
Manuf 0.87041 0.69464 1.00000 0.78243 0.81121 0.84189 0.80395 0.81809 0.74489 0.94605
Enrgy 0.64622 0.55865 0.78243 1.00000 0.58911 0.67134 0.56925 0.61252 0.56865 0.81365
HiTec 0.73941 0.70203 0.81121 0.58911 1.00000 0.74904 0.91274 0.84179 0.58973 0.80581
Telcm 0.83792 0.59109 0.84189 0.67134 0.74904 1.00000 0.77078 0.76844 0.72814 0.85493
Shops 0.77075 0.68265 0.80395 0.56925 0.91274 0.77078 1.00000 0.80924 0.61446 0.79342
Hlth 0.79993 0.60963 0.81809 0.61252 0.84179 0.76844 0.80924 1.00000 0.69965 0.80394
Utils 0.82813 0.44792 0.74489 0.56865 0.58973 0.72814 0.61446 0.69965 1.00000 0.72542
Other 0.83937 0.69685 0.94605 0.81365 0.80581 0.85493 0.79342 0.80394 0.72542 1.00000
我试过的代码:
eigenvalues, eigenvectors = LA.eig(rolling_cor_monthly)
idx = eigenvalues.argsort()[::-1]
D = pd.DataFrame(data = np.diag(eigenvalues[idx]))
P = pd.DataFrame(data = eigenvectors[:,idx])
错误:
LinAlgError: Last 2 dimensions of the array must be square
我希望获得的输出与数据帧的格式相同。
非常感谢!
这需要处理额外的维度,所以会涉及更多一点:
import numpy as np
import numpy.linalg as LA
import pandas as pd
# convert dataframe to a 3-d array (the new axis will correspond to date index)
arr = df.values[np.newaxis,:,:].reshape((len(df.index.levels[0]),10,10))
# get eigenvalues (n x 10) and eigenvectors (n x 10 x 10)
eigenvalues, eigenvectors = LA.eig(arr)
您的其余代码(排序和转换为数据帧)可以写成:
eigenvalues = np.sort(eigenvalues, axis=1)[:, ::-1]
# can also use this to sort:
# idx = eigenvalues.argsort()[:, ::-1]
# eigenvalues = np.take_along_axis(eigenvalues, idx, axis=1))
D = pd.DataFrame(
np.apply_along_axis(np.diag, 1, eigenvalues).reshape(-1,10),
index=df.index
)
eigenvectors = np.sort(eigenvectors, axis=1)[:, ::-1]
P = pd.DataFrame(
eigenvectors.reshape(-1,10),
index=df.index
)