熔化 Pandas Dataframe 的上三角矩阵
Melt the Upper Triangular Matrix of a Pandas Dataframe
给定一个正方形 pandas DataFrame,其形式如下:
a b c
a 1 .5 .3
b .5 1 .4
c .3 .4 1
如何将upper triangle熔化得到如下形式的矩阵
Row Column Value
a a 1
a b .5
a c .3
b b 1
b c .4
c c 1
#Note the combination a,b is only listed once. There is no b,a listing
我对惯用的 pandas 解决方案更感兴趣,自定义索引器很容易手写...
提前感谢您的考虑和回复。
首先,我通过 where
and numpy.triu
and then stack
, reset_index
将 df
的较低值转换为 NaN
并设置列名称:
import numpy as np
print df
a b c
a 1.0 0.5 0.3
b 0.5 1.0 0.4
c 0.3 0.4 1.0
print np.triu(np.ones(df.shape)).astype(np.bool)
[[ True True True]
[False True True]
[False False True]]
df = df.where(np.triu(np.ones(df.shape)).astype(np.bool))
print df
a b c
a 1 0.5 0.3
b NaN 1.0 0.4
c NaN NaN 1.0
df = df.stack().reset_index()
df.columns = ['Row','Column','Value']
print df
Row Column Value
0 a a 1.0
1 a b 0.5
2 a c 0.3
3 b b 1.0
4 b c 0.4
5 c c 1.0
从@jezrael 的解决方案构建,布尔索引将是一种更明确的方法:
import numpy
from pandas import DataFrame
df = DataFrame({'a':[1,.5,.3],'b':[.5,1,.4],'c':[.3,.4,1]},index=list('abc'))
print df,'\n'
keep = np.triu(np.ones(df.shape)).astype('bool').reshape(df.size)
print df.stack()[keep]
输出:
a b c
a 1.0 0.5 0.3
b 0.5 1.0 0.4
c 0.3 0.4 1.0
a a 1.0
b 0.5
c 0.3
b b 1.0
c 0.4
c c 1.0
dtype: float64
同样建立在@jezrael 的解决方案之上,这是一个版本,添加了一个函数来执行逆运算(从 xy 到矩阵),在我的情况下有用的协方差/相关矩阵。
def matrix_to_xy(df, columns=None, reset_index=False):
bool_index = np.triu(np.ones(df.shape)).astype(bool)
xy = (
df.where(bool_index).stack().reset_index()
if reset_index
else df.where(bool_index).stack()
)
if reset_index:
xy.columns = columns or ["row", "col", "val"]
return xy
def xy_to_matrix(xy):
df = xy.pivot(*xy.columns).fillna(0)
df_vals = df.to_numpy()
df = pd.DataFrame(
np.triu(df_vals, 1) + df_vals.T, index=df.index, columns=df.index
)
return df
df = pd.DataFrame(
{"a": [1, 0.5, 0.3], "b": [0.5, 1, 0.4], "c": [0.3, 0.4, 1]},
index=list("abc"),
)
print(df)
xy = matrix_to_xy(df, reset_index=True)
print(xy)
mx = xy_to_matrix(xy)
print(mx)
输出:
a b c
a 1.0 0.5 0.3
b 0.5 1.0 0.4
c 0.3 0.4 1.0
row col val
0 a a 1.0
1 a b 0.5
2 a c 0.3
3 b b 1.0
4 b c 0.4
5 c c 1.0
row a b c
row
a 1.0 0.5 0.3
b 0.5 1.0 0.4
c 0.3 0.4 1.0
给定一个正方形 pandas DataFrame,其形式如下:
a b c
a 1 .5 .3
b .5 1 .4
c .3 .4 1
如何将upper triangle熔化得到如下形式的矩阵
Row Column Value
a a 1
a b .5
a c .3
b b 1
b c .4
c c 1
#Note the combination a,b is only listed once. There is no b,a listing
我对惯用的 pandas 解决方案更感兴趣,自定义索引器很容易手写...
提前感谢您的考虑和回复。
首先,我通过 where
and numpy.triu
and then stack
, reset_index
将 df
的较低值转换为 NaN
并设置列名称:
import numpy as np
print df
a b c
a 1.0 0.5 0.3
b 0.5 1.0 0.4
c 0.3 0.4 1.0
print np.triu(np.ones(df.shape)).astype(np.bool)
[[ True True True]
[False True True]
[False False True]]
df = df.where(np.triu(np.ones(df.shape)).astype(np.bool))
print df
a b c
a 1 0.5 0.3
b NaN 1.0 0.4
c NaN NaN 1.0
df = df.stack().reset_index()
df.columns = ['Row','Column','Value']
print df
Row Column Value
0 a a 1.0
1 a b 0.5
2 a c 0.3
3 b b 1.0
4 b c 0.4
5 c c 1.0
从@jezrael 的解决方案构建,布尔索引将是一种更明确的方法:
import numpy
from pandas import DataFrame
df = DataFrame({'a':[1,.5,.3],'b':[.5,1,.4],'c':[.3,.4,1]},index=list('abc'))
print df,'\n'
keep = np.triu(np.ones(df.shape)).astype('bool').reshape(df.size)
print df.stack()[keep]
输出:
a b c
a 1.0 0.5 0.3
b 0.5 1.0 0.4
c 0.3 0.4 1.0
a a 1.0
b 0.5
c 0.3
b b 1.0
c 0.4
c c 1.0
dtype: float64
同样建立在@jezrael 的解决方案之上,这是一个版本,添加了一个函数来执行逆运算(从 xy 到矩阵),在我的情况下有用的协方差/相关矩阵。
def matrix_to_xy(df, columns=None, reset_index=False):
bool_index = np.triu(np.ones(df.shape)).astype(bool)
xy = (
df.where(bool_index).stack().reset_index()
if reset_index
else df.where(bool_index).stack()
)
if reset_index:
xy.columns = columns or ["row", "col", "val"]
return xy
def xy_to_matrix(xy):
df = xy.pivot(*xy.columns).fillna(0)
df_vals = df.to_numpy()
df = pd.DataFrame(
np.triu(df_vals, 1) + df_vals.T, index=df.index, columns=df.index
)
return df
df = pd.DataFrame(
{"a": [1, 0.5, 0.3], "b": [0.5, 1, 0.4], "c": [0.3, 0.4, 1]},
index=list("abc"),
)
print(df)
xy = matrix_to_xy(df, reset_index=True)
print(xy)
mx = xy_to_matrix(xy)
print(mx)
输出:
a b c
a 1.0 0.5 0.3
b 0.5 1.0 0.4
c 0.3 0.4 1.0
row col val
0 a a 1.0
1 a b 0.5
2 a c 0.3
3 b b 1.0
4 b c 0.4
5 c c 1.0
row a b c
row
a 1.0 0.5 0.3
b 0.5 1.0 0.4
c 0.3 0.4 1.0