来自 3 列数据框的三角形热图

Triangular heatmap from 3-column dataframe

我有一个包含两个分类列和第三个整数的数据框:

import pandas as pd

df1 = pd.DataFrame({
    'First': ['A','A','A','B','B','C'], 
    'Second': ['B','C','D','C','D','D'], 
    'Value': [1,2,3,4,5,6]}
)

df1

    First   Second  Value
0   A   B   1
1   A   C   2
2   A   D   3
3   B   D   4
4   B   D   5
5   C   D   6

我想得到对应的三角矩阵,因为(缺失值可以是NAs):

A B C D
  1 2 3 A
    4 5 B
      6 C

最后,将其绘制在三角形热图中,我相信我可以在 this question 的帮助下完成,但是,这需要一个 numpy 掩码数组作为输入。也非常欢迎其他不使用 numpy 来绘制此图的解决方案。

关于如何实现这个的任何 pythonic 想法?

编辑:

我意识到我举的例子太简洁了。我的专栏不是按照上面的方式组织的。所以我有这样的东西:

df1 = pd.DataFrame({
'First': ['D','C','B','A','C','A','B','D','B','C'], 
'Second': ['E','E','C','D','D','E','E','B','A','A'], 
'Value': [1,2,3,4,5,6,7,8,9,10]}

)

      First Second  Value
0     D      E      1
1     C      E      2
2     B      C      3
3     A      D      4
4     C      D      5
5     A      E      6
6     B      E      7
7     D      B      8
8     B      A      9
9     C      A     10

df1.pivot('First','Second','Value')

生产

  Second    A    B      C      D      E
First                   
A          NaN  NaN     NaN     4.0     6.0
B          9.0  NaN     3.0     NaN     7.0
C         10.0  NaN     NaN     5.0     2.0
D          NaN  8.0     NaN     NaN     1.0

非三角非对称矩阵。我需要具有相同数量的行和列,并将所有这些 NaN 推到边缘以创建一个三角形。枢轴似乎不是一个可能的解决方案

EDIT2

解决方案和所需的输出存在并且是:

    A   B   C   D   E
A   NaN 9   10  4   6
B   NaN NaN 3   8   7
C   NaN NaN NaN 5   2
D   NaN NaN NaN NaN 1
E   NaN NaN NaN NaN NaN

您可以 pivot 然后将 DataFrame 传递给您的链接解决方案:

df = df1.pivot('First','Second','Value')
print (df)
Second    B    C    D
First                
A       1.0  2.0  3.0
B       NaN  4.0  5.0
C       NaN  NaN  6.0

from matplotlib import pyplot as PLT
from matplotlib import cm as CM


fig = PLT.figure()
ax1 = fig.add_subplot(111)
cmap = CM.get_cmap('jet', 10) # jet doesn't have white color
cmap.set_bad('w') # default value is 'k'
#passed DataFrame
ax1.imshow(df, interpolation="nearest", cmap=cmap)
ax1.grid(True)
PLT.show()

编辑:解决方案是对每行的 FirstSecond 列进行排序:

df1[['First','Second']] = np.sort(df1[['First','Second']], axis=1)
df = df1.pivot('First','Second','Value')
print (df)
Second    B     C    D    E
First                      
A       9.0  10.0  4.0  6.0
B       NaN   3.0  8.0  7.0
C       NaN   NaN  5.0  2.0
D       NaN   NaN  NaN  1.0

from matplotlib import pyplot as PLT
from matplotlib import cm as CM

#
fig = PLT.figure()
ax1 = fig.add_subplot(111)
cmap = CM.get_cmap('jet', 10) # jet doesn't have white color
cmap.set_bad('w') # default value is 'k'
#passed DataFrame
ax1.imshow(df, interpolation="nearest", cmap=cmap)
ax1.grid(True)
PLT.show()