创建 for 循环或函数以从数据帧创建多个热图

Creating a for loop or function to create multiple heatmaps from a dataframe

我对 python 比较陌生,所以我不太擅长 for/while 循环或函数。

基本上,我有一个如下所示的数据框:

temp | dewpoint | wind | precip_rate_hr | total_snow
-------------------------------------------------
31        20       3         0.2            2.1
29        25       12         0.01           0.7
30        30       17         0.5            4.1
...      ...      ...         ...            ...

我一直在使用 seaborne 创建热图,比较前四列,显示的两个变量之间的平均值为 total_snow。如果这听起来不对,我们深表歉意。这是我的代码:

snow_data_percentile_10_temp = np.percentile(snow_data['temp'], 10)
snow_data_percentile_50_temp = np.percentile(snow_data['temp'], 50)
snow_data_percentile_75_temp = np.percentile(snow_data['temp'], 75)

snow_data_percentile_10_dewpt = np.percentile(snow_data['dewpoint'], 10)
snow_data_percentile_50_dewpt = np.percentile(snow_data['dewpoint'], 50)
snow_data_percentile_75_dewpt = np.percentile(snow_data['dewpoint'], 75)

snow_data['temp_bin'] = pd.cut(snow_data['temp'], [0, 10.4, 23.5, 28.75, 37], labels=['<10.4', '10.4-23.5', '23.5-28.75', '>28.75'])
snow_data['dewpt_bin'] = pd.cut(snow_data['dewpoint'], [0, 4.1, 15, 19.75, 33], labels=['<4.1', '4.1-15', '15-19.75', '>19.75'])

avg_snow = snow_data.groupby(['temp_bin','dewpt_bin'], as_index=False)['total_snow'].mean()

data_fp = avg_snow.pivot_table(index='temp_bin', columns='dewpt_bin', values='total_snow')
sns.set(font_scale=1.2)
f, ax = plt.subplots(figsize=(25,25))
sns.set(font_scale=2.0)
sns.heatmap(data_fp, annot=True, fmt='g', linewidth=0.5) 
ax.set_title('Average Snow Total on Days that Met Specific Temperature and Dewpoint Criteria', fontsize=20)

这是热图的截图。条柱上显示的值是这些条柱的平均积雪量。有没有办法简化这段代码?我需要为温度与风、温度与 precip_rate_hr、露点与风、露点与 precip_rate_hr、风与 precip_rate_hr 创建热图。我还需要处理另一个更大的数据集。现在,我只是将代码复制并粘贴到新文件中,并围绕其中一些进行更改以获得其余的热图。这不会花太长时间,但我想更多地自动化它并防止自己拥有大量代码文件。如有任何帮助,我们将不胜感激!

这是一种循环创建所有图表的方法。它不是最干净的,因为会有重复的图表(但交换了轴)。

我的数据:

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

snow_data = pd.DataFrame(data={"temp": np.random.randint(20, 40, 50), "dewpoint": np.random.randint(15, 40, 50),
                               "wind": np.random.randint(0, 20, 50), "precip_rate_hr": np.random.random(50),
                               "total_snow": np.random.random(50)*10})

创建类别(这不是循环,因为 bin 都不同):

snow_data['temp_bin'] = pd.cut(snow_data['temp'], [0, 10.4, 23.5, 28.75, 37], labels=['<10.4', '10.4-23.5', '23.5-28.75', '>28.75'])
snow_data['dewpt_bin'] = pd.cut(snow_data['dewpoint'], [0, 4.1, 15, 19.75, 33], labels=['<4.1', '4.1-15', '15-19.75', '>19.75'])
snow_data['wind_bin'] = pd.cut(snow_data['wind'], [0, 5, 10, 15, 20], labels=['<5', '5-10', '10-15', '15-20'])
snow_data['precip_rate_hr_bin'] = pd.cut(snow_data['precip_rate_hr'], [0, 0.25, 0.5, 0.75, 1], labels=['<0.25', '0.25-0.5', '0.5-0.75', '>0.75'])

循环:

# List of all _bin columns to loop through
bin_cols = ['temp_bin', 'dewpt_bin', 'wind_bin', 'precip_rate_hr_bin']

# First factor
for i in bin_cols:
    # Second factor
    for j in bin_cols:
        # Need to ensure you aren't grouping the data by the same column twice!
        if j != i:
            # Average now mean for bin groups
            avg_snow = snow_data.groupby([i, j], as_index=False)['total_snow'].mean()
            # Title for plot
            title = 'Average Snow Total on Days that Met Specific ' + i[: -4] + ' and ' + j[: -4] + ' Criteria'
            # Pivot table
            data_fp = avg_snow.pivot_table(index=i, columns=j, values='total_snow')
            # Plot
            sns.set(font_scale=1.2)
            f, ax = plt.subplots(figsize=(25, 25))
            sns.set(font_scale=2.0)
            sns.heatmap(data_fp, annot=True, fmt='g', linewidth=0.5)
            ax.set_title(title, fontsize=20)
            plt.show()

我同意 Parfait 关于不需要 np.percentile 的评论,除非您使用这些来为 _bin 列找到合适的类别。