在 Python 中绘制 "stacked" 变量的密度分布,按 0 或 1 分类
Plot "stacked" density distributions of variables, categorized by 0 or 1, in Python
我有以下数据集:
df = pd.DataFrame(np.random.randint(0, 100, size=(100, 6)), columns = ['Var_1', 'Var_2', 'Var_3', 'Var_4', 'Var_5', 'Var_6'])
df['Status'] = np.random.randint(0, 2, size=(100, 1))
df
Out[1]:
Var_1 Var_2 Var_3 Var_4 Var_5 Var_6 Status
0 32 65 48 83 60 21 1
1 44 49 65 84 52 34 1
2 9 2 3 14 82 80 1
3 66 90 97 60 28 12 0
4 28 95 64 53 39 30 1
.. ... ... ... ... ... ... ...
95 22 4 43 9 79 46 1
96 10 26 91 59 99 93 0
97 10 31 33 15 99 25 1
98 41 48 80 65 58 18 1
99 39 42 22 56 91 40 1
[100 rows x 7 columns]
如何创建每个变量的“堆叠”密度分布图,按 Status
(0 或 1)分类。我希望情节看起来像这样:
此图是用 R 创建的。Python 中的图不必看起来完全一样。我可以使用什么代码来完成此操作?谢谢
这里是 seaborn 的 ridgeplot example 对给定结构的改编。这里 multiple='stack'
是在 sns.kdeplot
中选择的(默认是 multiple='layer'
从 y=0
开始绘制它们)。请注意,common_norm
默认为 True
,它会根据样本数量按比例缩小两条曲线。
由于 seaborn 使用 "long form" 中的数据,pd.melt()
转换给定的数据帧。长格式如下:
Status variable value
0 0 Var 1 -0.961877
1 1 Var 1 6.454942
2 0 Var 1 6.020015
3 0 Var 1 7.094057
4 0 Var 1 10.289022
... ... ...
2995 0 Var 6 -5.718156
2996 0 Var 6 -5.142314
2997 0 Var 6 -5.155104
2998 1 Var 6 3.339401
2999 1 Var 6 7.912669
这是一个完整的代码示例:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
sns.set_theme(style="white", rc={"axes.facecolor": (0, 0, 0, 0)})
# Create the data
rs = np.random.RandomState(1979)
data = rs.randn(30, 100).cumsum(axis=1).reshape(-1, 6)
column_names = [f'Var {i}' for i in range(1, 7)]
df = pd.DataFrame(data, columns=column_names)
df['Status'] = rs.randint(0, 2, len(df))
for col in column_names:
df.loc[df['Status'] == 1, col] += 5
df_long = df.melt(id_vars='Status', value_vars=column_names)
# Initialize the FacetGrid object
g = sns.FacetGrid(data=df_long, row="variable", aspect=6, height=1.8)
# Draw the densities
g.map_dataframe(sns.kdeplot, "value",
bw_adjust=.5, clip_on=False, fill=True, alpha=1, linewidth=1.5,
hue="Status", hue_order=[0, 1], palette=['tomato', 'turquoise'], multiple='stack')
g.map(plt.axhline, y=0, lw=2, clip_on=False, color='black')
# Define and use a simple function to label the plot in axes coordinates
def label(x, color):
ax = plt.gca()
ax.text(0, .2, x.iloc[0], fontweight="bold", color='black',
ha="left", va="center", transform=ax.transAxes)
g.map(label, "variable")
# Set the subplots to overlap
g.fig.subplots_adjust(hspace=-.25)
# Remove axes details that don't play well with overlap
g.set_titles("")
g.set(yticks=[], xlabel="")
g.despine(bottom=True, left=True)
plt.show()
我有以下数据集:
df = pd.DataFrame(np.random.randint(0, 100, size=(100, 6)), columns = ['Var_1', 'Var_2', 'Var_3', 'Var_4', 'Var_5', 'Var_6'])
df['Status'] = np.random.randint(0, 2, size=(100, 1))
df
Out[1]:
Var_1 Var_2 Var_3 Var_4 Var_5 Var_6 Status
0 32 65 48 83 60 21 1
1 44 49 65 84 52 34 1
2 9 2 3 14 82 80 1
3 66 90 97 60 28 12 0
4 28 95 64 53 39 30 1
.. ... ... ... ... ... ... ...
95 22 4 43 9 79 46 1
96 10 26 91 59 99 93 0
97 10 31 33 15 99 25 1
98 41 48 80 65 58 18 1
99 39 42 22 56 91 40 1
[100 rows x 7 columns]
如何创建每个变量的“堆叠”密度分布图,按 Status
(0 或 1)分类。我希望情节看起来像这样:
此图是用 R 创建的。Python 中的图不必看起来完全一样。我可以使用什么代码来完成此操作?谢谢
这里是 seaborn 的 ridgeplot example 对给定结构的改编。这里 multiple='stack'
是在 sns.kdeplot
中选择的(默认是 multiple='layer'
从 y=0
开始绘制它们)。请注意,common_norm
默认为 True
,它会根据样本数量按比例缩小两条曲线。
由于 seaborn 使用 "long form" 中的数据,pd.melt()
转换给定的数据帧。长格式如下:
Status variable value
0 0 Var 1 -0.961877
1 1 Var 1 6.454942
2 0 Var 1 6.020015
3 0 Var 1 7.094057
4 0 Var 1 10.289022
... ... ...
2995 0 Var 6 -5.718156
2996 0 Var 6 -5.142314
2997 0 Var 6 -5.155104
2998 1 Var 6 3.339401
2999 1 Var 6 7.912669
这是一个完整的代码示例:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
sns.set_theme(style="white", rc={"axes.facecolor": (0, 0, 0, 0)})
# Create the data
rs = np.random.RandomState(1979)
data = rs.randn(30, 100).cumsum(axis=1).reshape(-1, 6)
column_names = [f'Var {i}' for i in range(1, 7)]
df = pd.DataFrame(data, columns=column_names)
df['Status'] = rs.randint(0, 2, len(df))
for col in column_names:
df.loc[df['Status'] == 1, col] += 5
df_long = df.melt(id_vars='Status', value_vars=column_names)
# Initialize the FacetGrid object
g = sns.FacetGrid(data=df_long, row="variable", aspect=6, height=1.8)
# Draw the densities
g.map_dataframe(sns.kdeplot, "value",
bw_adjust=.5, clip_on=False, fill=True, alpha=1, linewidth=1.5,
hue="Status", hue_order=[0, 1], palette=['tomato', 'turquoise'], multiple='stack')
g.map(plt.axhline, y=0, lw=2, clip_on=False, color='black')
# Define and use a simple function to label the plot in axes coordinates
def label(x, color):
ax = plt.gca()
ax.text(0, .2, x.iloc[0], fontweight="bold", color='black',
ha="left", va="center", transform=ax.transAxes)
g.map(label, "variable")
# Set the subplots to overlap
g.fig.subplots_adjust(hspace=-.25)
# Remove axes details that don't play well with overlap
g.set_titles("")
g.set(yticks=[], xlabel="")
g.despine(bottom=True, left=True)
plt.show()