计算 python 中一系列相关矩阵(作为数据帧,pandas)之间的差异
Calculating the differences between series of correlation matrixes (as DataFrames, pandas) in python
来自一系列包含大量股票的每日日志returns的DataFrames,例如:
data_list = [data_2015, data_2016, data_2017, data_2018, data_2019, data_2020]
手头的任务是计算每个连续年份之间相关性的变化,例如:
data_2015.corr() - data_2016.corr()
需要的是元素方面的差异/变化。一个简单的 for 循环给出了非常糟糕的答案,我被卡住了
for i in data_list:
j = i +1
a = (i).corr()
b = (j).corr()
print(a-b)
一个程式化的示例将按如下方式工作:
#import pandas and numpy
import numpy as np
import pandas as pd
#create four symmetric matrices with 1 on the diagonal as correlation matrix
np.random.seed(39)
b = np.random.randint(-100,100,size=(4,4))/100
b_symm = (b + b.T)/2
b = np.fill_diagonal(b_symm, 1)
c = np.random.randint(-100,100,size=(4,4))/100
c_symm = (c + c.T)/2
c = np.fill_diagonal(c_symm, 1)
d = np.random.randint(-100,100,size=(4,4))/100
d_symm = (d + d.T)/2
d = np.fill_diagonal(d_symm, 1)
e = np.random.randint(-100,100,size=(4,4))/100
e_symm = (e + e.T)/2
e = np.fill_diagonal(e_symm, 1)
#convert to DataFrame
data_2015 = pd.DataFrame(b_symm)
data_2016 = pd.DataFrame(c_symm)
data_2017 = pd.DataFrame(d_symm)
data_2018 = pd.DataFrame(e_symm)
#print DataFrames
print(data_2015)
print(data_2016)
print(data_2017)
print(data_2018)
#print intended result(s)
print("Change in correlations 2015-16",'\n',data_2015-data_2016,'\n')
print("Change in correlations 2016-17",'\n',data_2016-data_2017,'\n')
print("Change in correlations 2017-18",'\n',data_2017-data_2018,'\n')
如果我理解正确,您想访问 data_list
的连续元素对,以便计算它们相关性的差异?有很多方法可以做到,一个有点丑但至少透明的方法如下
for i in range(len(data_list)-1):
a = data_list[i+1].corr()
b = data_list[i].corr()
print(a-b)
一种更像 Python 的写法是
for a,b in zip(data_list[1:],data_list[:-1]):
print(a.corr()-b.corr())
甚至只是
[print(a.corr() - b.corr()) for a,b in zip(data_list[1:],data_list[:-1])]
来自一系列包含大量股票的每日日志returns的DataFrames,例如:
data_list = [data_2015, data_2016, data_2017, data_2018, data_2019, data_2020]
手头的任务是计算每个连续年份之间相关性的变化,例如:
data_2015.corr() - data_2016.corr()
需要的是元素方面的差异/变化。一个简单的 for 循环给出了非常糟糕的答案,我被卡住了
for i in data_list:
j = i +1
a = (i).corr()
b = (j).corr()
print(a-b)
一个程式化的示例将按如下方式工作:
#import pandas and numpy
import numpy as np
import pandas as pd
#create four symmetric matrices with 1 on the diagonal as correlation matrix
np.random.seed(39)
b = np.random.randint(-100,100,size=(4,4))/100
b_symm = (b + b.T)/2
b = np.fill_diagonal(b_symm, 1)
c = np.random.randint(-100,100,size=(4,4))/100
c_symm = (c + c.T)/2
c = np.fill_diagonal(c_symm, 1)
d = np.random.randint(-100,100,size=(4,4))/100
d_symm = (d + d.T)/2
d = np.fill_diagonal(d_symm, 1)
e = np.random.randint(-100,100,size=(4,4))/100
e_symm = (e + e.T)/2
e = np.fill_diagonal(e_symm, 1)
#convert to DataFrame
data_2015 = pd.DataFrame(b_symm)
data_2016 = pd.DataFrame(c_symm)
data_2017 = pd.DataFrame(d_symm)
data_2018 = pd.DataFrame(e_symm)
#print DataFrames
print(data_2015)
print(data_2016)
print(data_2017)
print(data_2018)
#print intended result(s)
print("Change in correlations 2015-16",'\n',data_2015-data_2016,'\n')
print("Change in correlations 2016-17",'\n',data_2016-data_2017,'\n')
print("Change in correlations 2017-18",'\n',data_2017-data_2018,'\n')
如果我理解正确,您想访问 data_list
的连续元素对,以便计算它们相关性的差异?有很多方法可以做到,一个有点丑但至少透明的方法如下
for i in range(len(data_list)-1):
a = data_list[i+1].corr()
b = data_list[i].corr()
print(a-b)
一种更像 Python 的写法是
for a,b in zip(data_list[1:],data_list[:-1]):
print(a.corr()-b.corr())
甚至只是
[print(a.corr() - b.corr()) for a,b in zip(data_list[1:],data_list[:-1])]