来自 3 列数据框的三角形热图
Triangular heatmap from 3-column dataframe
我有一个包含两个分类列和第三个整数的数据框:
import pandas as pd
df1 = pd.DataFrame({
'First': ['A','A','A','B','B','C'],
'Second': ['B','C','D','C','D','D'],
'Value': [1,2,3,4,5,6]}
)
df1
First Second Value
0 A B 1
1 A C 2
2 A D 3
3 B D 4
4 B D 5
5 C D 6
我想得到对应的三角矩阵,因为(缺失值可以是NAs):
A B C D
1 2 3 A
4 5 B
6 C
最后,将其绘制在三角形热图中,我相信我可以在 this question 的帮助下完成,但是,这需要一个 numpy 掩码数组作为输入。也非常欢迎其他不使用 numpy 来绘制此图的解决方案。
关于如何实现这个的任何 pythonic 想法?
编辑:
我意识到我举的例子太简洁了。我的专栏不是按照上面的方式组织的。所以我有这样的东西:
df1 = pd.DataFrame({
'First': ['D','C','B','A','C','A','B','D','B','C'],
'Second': ['E','E','C','D','D','E','E','B','A','A'],
'Value': [1,2,3,4,5,6,7,8,9,10]}
)
First Second Value
0 D E 1
1 C E 2
2 B C 3
3 A D 4
4 C D 5
5 A E 6
6 B E 7
7 D B 8
8 B A 9
9 C A 10
和
df1.pivot('First','Second','Value')
生产
Second A B C D E
First
A NaN NaN NaN 4.0 6.0
B 9.0 NaN 3.0 NaN 7.0
C 10.0 NaN NaN 5.0 2.0
D NaN 8.0 NaN NaN 1.0
非三角非对称矩阵。我需要具有相同数量的行和列,并将所有这些 NaN 推到边缘以创建一个三角形。枢轴似乎不是一个可能的解决方案
EDIT2
解决方案和所需的输出存在并且是:
A B C D E
A NaN 9 10 4 6
B NaN NaN 3 8 7
C NaN NaN NaN 5 2
D NaN NaN NaN NaN 1
E NaN NaN NaN NaN NaN
您可以 pivot
然后将 DataFrame
传递给您的链接解决方案:
df = df1.pivot('First','Second','Value')
print (df)
Second B C D
First
A 1.0 2.0 3.0
B NaN 4.0 5.0
C NaN NaN 6.0
from matplotlib import pyplot as PLT
from matplotlib import cm as CM
fig = PLT.figure()
ax1 = fig.add_subplot(111)
cmap = CM.get_cmap('jet', 10) # jet doesn't have white color
cmap.set_bad('w') # default value is 'k'
#passed DataFrame
ax1.imshow(df, interpolation="nearest", cmap=cmap)
ax1.grid(True)
PLT.show()
编辑:解决方案是对每行的 First
和 Second
列进行排序:
df1[['First','Second']] = np.sort(df1[['First','Second']], axis=1)
df = df1.pivot('First','Second','Value')
print (df)
Second B C D E
First
A 9.0 10.0 4.0 6.0
B NaN 3.0 8.0 7.0
C NaN NaN 5.0 2.0
D NaN NaN NaN 1.0
from matplotlib import pyplot as PLT
from matplotlib import cm as CM
#
fig = PLT.figure()
ax1 = fig.add_subplot(111)
cmap = CM.get_cmap('jet', 10) # jet doesn't have white color
cmap.set_bad('w') # default value is 'k'
#passed DataFrame
ax1.imshow(df, interpolation="nearest", cmap=cmap)
ax1.grid(True)
PLT.show()
我有一个包含两个分类列和第三个整数的数据框:
import pandas as pd
df1 = pd.DataFrame({
'First': ['A','A','A','B','B','C'],
'Second': ['B','C','D','C','D','D'],
'Value': [1,2,3,4,5,6]}
)
df1
First Second Value
0 A B 1
1 A C 2
2 A D 3
3 B D 4
4 B D 5
5 C D 6
我想得到对应的三角矩阵,因为(缺失值可以是NAs):
A B C D
1 2 3 A
4 5 B
6 C
最后,将其绘制在三角形热图中,我相信我可以在 this question 的帮助下完成,但是,这需要一个 numpy 掩码数组作为输入。也非常欢迎其他不使用 numpy 来绘制此图的解决方案。
关于如何实现这个的任何 pythonic 想法?
编辑:
我意识到我举的例子太简洁了。我的专栏不是按照上面的方式组织的。所以我有这样的东西:
df1 = pd.DataFrame({
'First': ['D','C','B','A','C','A','B','D','B','C'],
'Second': ['E','E','C','D','D','E','E','B','A','A'],
'Value': [1,2,3,4,5,6,7,8,9,10]}
)
First Second Value
0 D E 1
1 C E 2
2 B C 3
3 A D 4
4 C D 5
5 A E 6
6 B E 7
7 D B 8
8 B A 9
9 C A 10
和
df1.pivot('First','Second','Value')
生产
Second A B C D E
First
A NaN NaN NaN 4.0 6.0
B 9.0 NaN 3.0 NaN 7.0
C 10.0 NaN NaN 5.0 2.0
D NaN 8.0 NaN NaN 1.0
非三角非对称矩阵。我需要具有相同数量的行和列,并将所有这些 NaN 推到边缘以创建一个三角形。枢轴似乎不是一个可能的解决方案
EDIT2
解决方案和所需的输出存在并且是:
A B C D E
A NaN 9 10 4 6
B NaN NaN 3 8 7
C NaN NaN NaN 5 2
D NaN NaN NaN NaN 1
E NaN NaN NaN NaN NaN
您可以 pivot
然后将 DataFrame
传递给您的链接解决方案:
df = df1.pivot('First','Second','Value')
print (df)
Second B C D
First
A 1.0 2.0 3.0
B NaN 4.0 5.0
C NaN NaN 6.0
from matplotlib import pyplot as PLT
from matplotlib import cm as CM
fig = PLT.figure()
ax1 = fig.add_subplot(111)
cmap = CM.get_cmap('jet', 10) # jet doesn't have white color
cmap.set_bad('w') # default value is 'k'
#passed DataFrame
ax1.imshow(df, interpolation="nearest", cmap=cmap)
ax1.grid(True)
PLT.show()
编辑:解决方案是对每行的 First
和 Second
列进行排序:
df1[['First','Second']] = np.sort(df1[['First','Second']], axis=1)
df = df1.pivot('First','Second','Value')
print (df)
Second B C D E
First
A 9.0 10.0 4.0 6.0
B NaN 3.0 8.0 7.0
C NaN NaN 5.0 2.0
D NaN NaN NaN 1.0
from matplotlib import pyplot as PLT
from matplotlib import cm as CM
#
fig = PLT.figure()
ax1 = fig.add_subplot(111)
cmap = CM.get_cmap('jet', 10) # jet doesn't have white color
cmap.set_bad('w') # default value is 'k'
#passed DataFrame
ax1.imshow(df, interpolation="nearest", cmap=cmap)
ax1.grid(True)
PLT.show()