Pandas 类似于 rolling().corr() 的成对算法
Pandas pairwise arithmetic similar to rolling().corr()
我有一个数据框如下:
fsym EOS BTC BNB
time
2018-11-30 00:00:00+00:00 -0.051903 -0.069088 -0.058162
2018-12-01 00:00:00+00:00 0.026936 0.044739 0.040303
2018-12-02 00:00:00+00:00 -0.034843 -0.012935 -0.005900
2018-12-03 00:00:00+00:00 -0.108108 -0.070375 -0.028180
2018-12-04 00:00:00+00:00 -0.048583 0.019509 0.131986
我可以简单地计算列成对相关性:
pt = pt.rolling(3).corr()
产生:
sym EOS BTC BNB
time fsym
2018-11-30 00:00:00+00:00 EOS NaN NaN NaN
BTC NaN NaN NaN
BNB NaN NaN NaN
2018-12-01 00:00:00+00:00 EOS NaN NaN NaN
BTC NaN NaN NaN
BNB NaN NaN NaN
2018-12-02 00:00:00+00:00 EOS 1.000000 0.952709 0.938688
BTC 0.952709 1.000000 0.999066
BNB 0.938688 0.999066 1.000000
2018-12-03 00:00:00+00:00 EOS 1.000000 0.998738 0.969385
BTC 0.998738 1.000000 0.980492
BNB 0.969385 0.980492 1.000000
...
我怎样才能类似地计算数据帧的成对差异?我猜这相当于使用 1 的滚动 window。
编辑:正如评论中所指出的,上面的例子实际上并不是我没有注意到的列相关。
以下函数接近于适当的解决方案:
def columnwise_difference(df):
a = df.values
r,c = pd.np.triu_indices(a.shape[1], 1)
cols = df.columns
nm = [cols[i]+"-"+cols[j] for i,j in zip(r,c)]
return pd.DataFrame(a[:,r] - a[:,c], columns=nm, index=df.index)
给出:
EOS-BTC EOS-BNB BTC-BNB
time
2018-11-30 00:00:00+00:00 0.017185 0.006259 -0.010926
2018-12-01 00:00:00+00:00 -0.017803 -0.013367 0.004436
2018-12-02 00:00:00+00:00 -0.021908 -0.028943 -0.007035
2018-12-03 00:00:00+00:00 -0.037733 -0.079928 -0.042195
...除了我不只是想要 np.triu_indices
而是所有 9 种组合,包括 EOS-EOS 等(我必须做一个简单的改变才能做到这一点)
如果你想要 9 列:
# test data
df = pd.DataFrame(np.arange(12).reshape(-1,3), columns=list('abc'))
s = df.values
new_cols = pd.MultiIndex.from_product([df.columns, df.columns])
pd.DataFrame((s[:,None,:] - s[:, :, None]).reshape(len(df), -1),
index=df.index,
columns=new_cols)
输出:
a b c
a b c a b c a b c
0 0 1 2 -1 0 1 -2 -1 0
1 0 1 2 -1 0 1 -2 -1 0
2 0 1 2 -1 0 1 -2 -1 0
3 0 1 2 -1 0 1 -2 -1 0
我有一个数据框如下:
fsym EOS BTC BNB
time
2018-11-30 00:00:00+00:00 -0.051903 -0.069088 -0.058162
2018-12-01 00:00:00+00:00 0.026936 0.044739 0.040303
2018-12-02 00:00:00+00:00 -0.034843 -0.012935 -0.005900
2018-12-03 00:00:00+00:00 -0.108108 -0.070375 -0.028180
2018-12-04 00:00:00+00:00 -0.048583 0.019509 0.131986
我可以简单地计算列成对相关性:
pt = pt.rolling(3).corr()
产生:
sym EOS BTC BNB
time fsym
2018-11-30 00:00:00+00:00 EOS NaN NaN NaN
BTC NaN NaN NaN
BNB NaN NaN NaN
2018-12-01 00:00:00+00:00 EOS NaN NaN NaN
BTC NaN NaN NaN
BNB NaN NaN NaN
2018-12-02 00:00:00+00:00 EOS 1.000000 0.952709 0.938688
BTC 0.952709 1.000000 0.999066
BNB 0.938688 0.999066 1.000000
2018-12-03 00:00:00+00:00 EOS 1.000000 0.998738 0.969385
BTC 0.998738 1.000000 0.980492
BNB 0.969385 0.980492 1.000000
...
我怎样才能类似地计算数据帧的成对差异?我猜这相当于使用 1 的滚动 window。
编辑:正如评论中所指出的,上面的例子实际上并不是我没有注意到的列相关。
以下函数接近于适当的解决方案:
def columnwise_difference(df):
a = df.values
r,c = pd.np.triu_indices(a.shape[1], 1)
cols = df.columns
nm = [cols[i]+"-"+cols[j] for i,j in zip(r,c)]
return pd.DataFrame(a[:,r] - a[:,c], columns=nm, index=df.index)
给出:
EOS-BTC EOS-BNB BTC-BNB
time
2018-11-30 00:00:00+00:00 0.017185 0.006259 -0.010926
2018-12-01 00:00:00+00:00 -0.017803 -0.013367 0.004436
2018-12-02 00:00:00+00:00 -0.021908 -0.028943 -0.007035
2018-12-03 00:00:00+00:00 -0.037733 -0.079928 -0.042195
...除了我不只是想要 np.triu_indices
而是所有 9 种组合,包括 EOS-EOS 等(我必须做一个简单的改变才能做到这一点)
如果你想要 9 列:
# test data
df = pd.DataFrame(np.arange(12).reshape(-1,3), columns=list('abc'))
s = df.values
new_cols = pd.MultiIndex.from_product([df.columns, df.columns])
pd.DataFrame((s[:,None,:] - s[:, :, None]).reshape(len(df), -1),
index=df.index,
columns=new_cols)
输出:
a b c
a b c a b c a b c
0 0 1 2 -1 0 1 -2 -1 0
1 0 1 2 -1 0 1 -2 -1 0
2 0 1 2 -1 0 1 -2 -1 0
3 0 1 2 -1 0 1 -2 -1 0