创建 for 循环或函数以从数据帧创建多个热图
Creating a for loop or function to create multiple heatmaps from a dataframe
我对 python 比较陌生,所以我不太擅长 for/while 循环或函数。
基本上,我有一个如下所示的数据框:
temp | dewpoint | wind | precip_rate_hr | total_snow
-------------------------------------------------
31 20 3 0.2 2.1
29 25 12 0.01 0.7
30 30 17 0.5 4.1
... ... ... ... ...
我一直在使用 seaborne 创建热图,比较前四列,显示的两个变量之间的平均值为 total_snow。如果这听起来不对,我们深表歉意。这是我的代码:
snow_data_percentile_10_temp = np.percentile(snow_data['temp'], 10)
snow_data_percentile_50_temp = np.percentile(snow_data['temp'], 50)
snow_data_percentile_75_temp = np.percentile(snow_data['temp'], 75)
snow_data_percentile_10_dewpt = np.percentile(snow_data['dewpoint'], 10)
snow_data_percentile_50_dewpt = np.percentile(snow_data['dewpoint'], 50)
snow_data_percentile_75_dewpt = np.percentile(snow_data['dewpoint'], 75)
snow_data['temp_bin'] = pd.cut(snow_data['temp'], [0, 10.4, 23.5, 28.75, 37], labels=['<10.4', '10.4-23.5', '23.5-28.75', '>28.75'])
snow_data['dewpt_bin'] = pd.cut(snow_data['dewpoint'], [0, 4.1, 15, 19.75, 33], labels=['<4.1', '4.1-15', '15-19.75', '>19.75'])
avg_snow = snow_data.groupby(['temp_bin','dewpt_bin'], as_index=False)['total_snow'].mean()
data_fp = avg_snow.pivot_table(index='temp_bin', columns='dewpt_bin', values='total_snow')
sns.set(font_scale=1.2)
f, ax = plt.subplots(figsize=(25,25))
sns.set(font_scale=2.0)
sns.heatmap(data_fp, annot=True, fmt='g', linewidth=0.5)
ax.set_title('Average Snow Total on Days that Met Specific Temperature and Dewpoint Criteria', fontsize=20)
这是热图的截图。条柱上显示的值是这些条柱的平均积雪量。有没有办法简化这段代码?我需要为温度与风、温度与 precip_rate_hr、露点与风、露点与 precip_rate_hr、风与 precip_rate_hr 创建热图。我还需要处理另一个更大的数据集。现在,我只是将代码复制并粘贴到新文件中,并围绕其中一些进行更改以获得其余的热图。这不会花太长时间,但我想更多地自动化它并防止自己拥有大量代码文件。如有任何帮助,我们将不胜感激!
这是一种循环创建所有图表的方法。它不是最干净的,因为会有重复的图表(但交换了轴)。
我的数据:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
snow_data = pd.DataFrame(data={"temp": np.random.randint(20, 40, 50), "dewpoint": np.random.randint(15, 40, 50),
"wind": np.random.randint(0, 20, 50), "precip_rate_hr": np.random.random(50),
"total_snow": np.random.random(50)*10})
创建类别(这不是循环,因为 bin 都不同):
snow_data['temp_bin'] = pd.cut(snow_data['temp'], [0, 10.4, 23.5, 28.75, 37], labels=['<10.4', '10.4-23.5', '23.5-28.75', '>28.75'])
snow_data['dewpt_bin'] = pd.cut(snow_data['dewpoint'], [0, 4.1, 15, 19.75, 33], labels=['<4.1', '4.1-15', '15-19.75', '>19.75'])
snow_data['wind_bin'] = pd.cut(snow_data['wind'], [0, 5, 10, 15, 20], labels=['<5', '5-10', '10-15', '15-20'])
snow_data['precip_rate_hr_bin'] = pd.cut(snow_data['precip_rate_hr'], [0, 0.25, 0.5, 0.75, 1], labels=['<0.25', '0.25-0.5', '0.5-0.75', '>0.75'])
循环:
# List of all _bin columns to loop through
bin_cols = ['temp_bin', 'dewpt_bin', 'wind_bin', 'precip_rate_hr_bin']
# First factor
for i in bin_cols:
# Second factor
for j in bin_cols:
# Need to ensure you aren't grouping the data by the same column twice!
if j != i:
# Average now mean for bin groups
avg_snow = snow_data.groupby([i, j], as_index=False)['total_snow'].mean()
# Title for plot
title = 'Average Snow Total on Days that Met Specific ' + i[: -4] + ' and ' + j[: -4] + ' Criteria'
# Pivot table
data_fp = avg_snow.pivot_table(index=i, columns=j, values='total_snow')
# Plot
sns.set(font_scale=1.2)
f, ax = plt.subplots(figsize=(25, 25))
sns.set(font_scale=2.0)
sns.heatmap(data_fp, annot=True, fmt='g', linewidth=0.5)
ax.set_title(title, fontsize=20)
plt.show()
我同意 Parfait 关于不需要 np.percentile
的评论,除非您使用这些来为 _bin 列找到合适的类别。
我对 python 比较陌生,所以我不太擅长 for/while 循环或函数。
基本上,我有一个如下所示的数据框:
temp | dewpoint | wind | precip_rate_hr | total_snow
-------------------------------------------------
31 20 3 0.2 2.1
29 25 12 0.01 0.7
30 30 17 0.5 4.1
... ... ... ... ...
我一直在使用 seaborne 创建热图,比较前四列,显示的两个变量之间的平均值为 total_snow。如果这听起来不对,我们深表歉意。这是我的代码:
snow_data_percentile_10_temp = np.percentile(snow_data['temp'], 10)
snow_data_percentile_50_temp = np.percentile(snow_data['temp'], 50)
snow_data_percentile_75_temp = np.percentile(snow_data['temp'], 75)
snow_data_percentile_10_dewpt = np.percentile(snow_data['dewpoint'], 10)
snow_data_percentile_50_dewpt = np.percentile(snow_data['dewpoint'], 50)
snow_data_percentile_75_dewpt = np.percentile(snow_data['dewpoint'], 75)
snow_data['temp_bin'] = pd.cut(snow_data['temp'], [0, 10.4, 23.5, 28.75, 37], labels=['<10.4', '10.4-23.5', '23.5-28.75', '>28.75'])
snow_data['dewpt_bin'] = pd.cut(snow_data['dewpoint'], [0, 4.1, 15, 19.75, 33], labels=['<4.1', '4.1-15', '15-19.75', '>19.75'])
avg_snow = snow_data.groupby(['temp_bin','dewpt_bin'], as_index=False)['total_snow'].mean()
data_fp = avg_snow.pivot_table(index='temp_bin', columns='dewpt_bin', values='total_snow')
sns.set(font_scale=1.2)
f, ax = plt.subplots(figsize=(25,25))
sns.set(font_scale=2.0)
sns.heatmap(data_fp, annot=True, fmt='g', linewidth=0.5)
ax.set_title('Average Snow Total on Days that Met Specific Temperature and Dewpoint Criteria', fontsize=20)
这是热图的截图。条柱上显示的值是这些条柱的平均积雪量。有没有办法简化这段代码?我需要为温度与风、温度与 precip_rate_hr、露点与风、露点与 precip_rate_hr、风与 precip_rate_hr 创建热图。我还需要处理另一个更大的数据集。现在,我只是将代码复制并粘贴到新文件中,并围绕其中一些进行更改以获得其余的热图。这不会花太长时间,但我想更多地自动化它并防止自己拥有大量代码文件。如有任何帮助,我们将不胜感激!
这是一种循环创建所有图表的方法。它不是最干净的,因为会有重复的图表(但交换了轴)。
我的数据:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
snow_data = pd.DataFrame(data={"temp": np.random.randint(20, 40, 50), "dewpoint": np.random.randint(15, 40, 50),
"wind": np.random.randint(0, 20, 50), "precip_rate_hr": np.random.random(50),
"total_snow": np.random.random(50)*10})
创建类别(这不是循环,因为 bin 都不同):
snow_data['temp_bin'] = pd.cut(snow_data['temp'], [0, 10.4, 23.5, 28.75, 37], labels=['<10.4', '10.4-23.5', '23.5-28.75', '>28.75'])
snow_data['dewpt_bin'] = pd.cut(snow_data['dewpoint'], [0, 4.1, 15, 19.75, 33], labels=['<4.1', '4.1-15', '15-19.75', '>19.75'])
snow_data['wind_bin'] = pd.cut(snow_data['wind'], [0, 5, 10, 15, 20], labels=['<5', '5-10', '10-15', '15-20'])
snow_data['precip_rate_hr_bin'] = pd.cut(snow_data['precip_rate_hr'], [0, 0.25, 0.5, 0.75, 1], labels=['<0.25', '0.25-0.5', '0.5-0.75', '>0.75'])
循环:
# List of all _bin columns to loop through
bin_cols = ['temp_bin', 'dewpt_bin', 'wind_bin', 'precip_rate_hr_bin']
# First factor
for i in bin_cols:
# Second factor
for j in bin_cols:
# Need to ensure you aren't grouping the data by the same column twice!
if j != i:
# Average now mean for bin groups
avg_snow = snow_data.groupby([i, j], as_index=False)['total_snow'].mean()
# Title for plot
title = 'Average Snow Total on Days that Met Specific ' + i[: -4] + ' and ' + j[: -4] + ' Criteria'
# Pivot table
data_fp = avg_snow.pivot_table(index=i, columns=j, values='total_snow')
# Plot
sns.set(font_scale=1.2)
f, ax = plt.subplots(figsize=(25, 25))
sns.set(font_scale=2.0)
sns.heatmap(data_fp, annot=True, fmt='g', linewidth=0.5)
ax.set_title(title, fontsize=20)
plt.show()
我同意 Parfait 关于不需要 np.percentile
的评论,除非您使用这些来为 _bin 列找到合适的类别。