如何在热图中交叉影线数据差距

Question

我寻求一些帮助来绘制应该如下所示的热图：

可以沿x轴使用的数据集是从1975年到2018年的一组年份 [1975,......2018]

对于y轴：月份数组 [1 月至 12 月]

对于 x-y 交集值，如图所示，可以使用 1 或 2 或 3

在我添加的图像中，叉号表示数据间隙，空白表示零 (0) 值。

更新：

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np

months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
df = pd.read_csv('Events_in_Month_and_Year.xlsx',encoding = 'unicode_escape',error_bad_lines=False
                )
pivoted = df.pivot_table(index='month', columns='year', aggfunc=len, fill_value=0)
pivoted = pivoted.loc[months]  # change the order of the rows to be the same as months
for _ in range(20):
    # set some random locations to "not filled in"
    pivoted.iloc[np.random.randint(0, len(pivoted)), np.random.randint(0, len(pivoted.columns))] = np.nan
max_val = np.nanmax(pivoted.to_numpy())
ax = sns.heatmap(pivoted, cmap=plt.get_cmap('Greys', max_val + 1), vmin=-0.5, vmax=max_val + 0.5)
ax.patch.set_facecolor('white')
ax.patch.set_edgecolor('black')  # will be used for hatching
ax.patch.set_hatch('xxxx')
spines = ax.collections[0].colorbar.ax.spines
for s in spines:
    spines[s].set_visible(True) # show border around colorbar
plt.tight_layout()
plt.show()

我试过这段代码。但是出现错误

数据标记错误。 C 错误：捕获到缓冲区溢出 - 可能是格式错误的输入文件

我的数据存储在 .xlsx 文件中，如下所示

Answer 1

您可以使用 sns.heatmap to create a heatmap. You can hatch the background via ax.patch.set_hatch('xx') (more xs means a tighter hatch pattern). See the gallery 以获得更多填充选项。

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np

months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
df = pd.DataFrame({'month': np.random.choice(months, 1000), 'year': np.random.randint(1975, 2019, 1000)})
pivoted = df.pivot_table(index='month', columns='year', aggfunc=len, fill_value=0)
pivoted = pivoted.loc[months]  # change the order of the rows to be the same as months
for _ in range(20):
    # set some random locations to "not filled in"
    pivoted.iloc[np.random.randint(0, len(pivoted)), np.random.randint(0, len(pivoted.columns))] = np.nan
max_val = np.nanmax(pivoted.to_numpy())
ax = sns.heatmap(pivoted, cmap=plt.get_cmap('Greys', max_val + 1), vmin=-0.5, vmax=max_val + 0.5)
ax.patch.set_facecolor('white')
ax.patch.set_edgecolor('black')  # will be used for hatching
ax.patch.set_hatch('xxxx')
ax.collections[0].colorbar.outline.set_linewidth(1) # make outline visible
plt.tight_layout()
plt.show()

aPS：如果您有原始数据，例如在 Excel 中，您可以将它们保存为 csv file 并使用 df = pd.read_csv(filename).

加载它们

类似于 post 中的文件的代码可能如下所示。要区分 0 和“数据差距”，可以在 Excel 文件中用空单元格表示缺失数据。

可以通过分配新索引来添加缺失年份的空行。

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np

# read the dataframe from a .csv file
pivoted = pd.read_csv('test.csv', index_col=0) # maybe: delimiter=';'
# extend the index to include all intermediate years
pivoted = pd.DataFrame(pivoted, index=range(pivoted.index.min(), pivoted.index.max() + 1))
# exchange columns and rows
pivoted = pivoted.T 
max_val = np.nanmax(pivoted.to_numpy())
ax = sns.heatmap(pivoted, cmap=plt.get_cmap('Greys', max_val + 1), vmin=-0.5, vmax=max_val + 0.5,
                 cbar_kws={'ticks': np.arange(max_val+1)})
ax.patch.set_facecolor('white')
ax.patch.set_edgecolor('black')  # will be used for hatching
ax.patch.set_hatch('xxxx')
ax.collections[0].colorbar.outline.set_linewidth(1) # make outline visible
ax.collections[0].colorbar.outline.set_edgecolor('black')
plt.tight_layout()
plt.show()

如何在热图中交叉影线数据差距

How to cross hatch data gaps in a heatmap

python

matplotlib

heatmap

pandas

seaborn