Python 相关矩阵 3d 数据帧

Python correlation matrix 3d dataframe

我在 SQL 服务器中有一个历史 return table 按日期和资产 ID 是这样的:

[Date] [Asset] [1DRet]
jan   asset1   0.52
jan   asset2   0.12
jan   asset3   0.07
feb   asset1   0.41
feb   asset2   0.33
feb   asset3   0.21

...

所以我需要计算给定日期范围内所有资产组合的相关矩阵:A1,A2; A1,A3 ; A2,A3

我正在使用 pandas 并在我的 SQL Select 中过滤日期范围并按日期排序。

我正在尝试使用 pandas df.corr()、numpy.corrcoef 和 Scipy 来做到这一点,但无法为我的 n 变量数据帧做到这一点

我看到了一些示例,但它始终适用于数据框,在该数据框中,您每天每列和一行都有一个资产。

这是我正在执行的代码块:

qryRet = "Select * from IndexesValue where Date > '20100901' and Date < '20150901' order by Date"

result = conn.execute(qryRet)

df = pd.DataFrame(data=list(result),columns=result.keys())

df1d = df[['Date','Id_RiskFactor','1DReturn']]

corr = df1d.set_index(['Date','Id_RiskFactor']).unstack().corr()
corr.columns = corr.columns.droplevel()
corr.index = corr.columns.tolist()
corr.index.name = 'symbol_1'
corr.columns.name = 'symbol_2'
print(corr)

conn.close()

为此,我收到了这条消息:

corr.columns = corr.columns.droplevel()
AttributeError: 'Index' object has no attribute 'droplevel'

**Print(df1d.head())**
         Date  Id_RiskFactor         1DReturn
0  2010-09-02            149            0E-12
1  2010-09-02            150  -0.004242875148
2  2010-09-02             33   0.000590000011
3  2010-09-02             28   0.000099999997
4  2010-09-02             34  -0.000010000000

**print(df.head())**
         Date  Id_RiskFactor           Value         1DReturn         5DReturn
0  2010-09-02            149  0.040096000000            0E-12            0E-12
1  2010-09-02            150  1.736700000000  -0.004242875148  -0.013014321215
2  2010-09-02             33  2.283000000000   0.000590000011   0.001260000048
3  2010-09-02             28  2.113000000000   0.000099999997   0.000469999999
4  2010-09-02             34  0.615000000000  -0.000010000000   0.000079999998

**print(corr.columns)**
Index([], dtype='object')

创建示例 DataFrame:

import pandas as pd
import numpy as np

df = pd.DataFrame({'daily_return': np.random.random(15), 
                   'symbol': ['A'] * 5 + ['B'] * 5 + ['C'] * 5, 
                   'date': np.tile(pd.date_range('1-1-2015', periods=5), 3)})

>>> df
    daily_return       date symbol
0       0.011467 2015-01-01      A
1       0.613518 2015-01-02      A
2       0.334343 2015-01-03      A
3       0.371809 2015-01-04      A
4       0.169016 2015-01-05      A
5       0.431729 2015-01-01      B
6       0.474905 2015-01-02      B
7       0.372366 2015-01-03      B
8       0.801619 2015-01-04      B
9       0.505487 2015-01-05      B
10      0.946504 2015-01-01      C
11      0.337204 2015-01-02      C
12      0.798704 2015-01-03      C
13      0.311597 2015-01-04      C
14      0.545215 2015-01-05      C

我假设您已经针对相关日期过滤了 DataFrame。然后,您需要一个数据透视表 table,其中您将唯一日期作为索引,将符号作为单独的列,每天 returns 作为值。最后,您对结果调用 corr()

corr = df.set_index(['date','symbol']).unstack().corr()
corr.columns = corr.columns.droplevel()
corr.index = corr.columns.tolist()  
corr.index.name = 'symbol_1'
corr.columns.name = 'symbol_2'
>>> corr
symbol_2         A         B         C
symbol_1                              
A         1.000000  0.188065 -0.745115
B         0.188065  1.000000 -0.688808
C        -0.745115 -0.688808  1.000000

您可以 select 基于日期的 DataFrame 子集,如下所示:

start_date = pd.Timestamp('2015-1-4')
end_date = pd.Timestamp('2015-1-5')
>>> df.loc[df.date.between(start_date, end_date), :]
    daily_return       date symbol
3       0.371809 2015-01-04      A
4       0.169016 2015-01-05      A
8       0.801619 2015-01-04      B
9       0.505487 2015-01-05      B
13      0.311597 2015-01-04      C
14      0.545215 2015-01-05      C

如果您想展平相关矩阵:

corr.stack().reset_index()
  symbol_1 symbol_2         0
0        A        A  1.000000
1        A        B  0.188065
2        A        C -0.745115
3        B        A  0.188065
4        B        B  1.000000
5        B        C -0.688808
6        C        A -0.745115
7        C        B -0.688808
8        C        C  1.000000