显示比例和填充不起作用的条形图
Display a bar chart with proportions and fill not working
我正在使用 plotnine 绘制一些图。当我尝试显示比例条形图而不是计数时,fill
参数变得无用。我注意到删除 group=1
参数有助于使 fill
参数再次“激活”。但是,如果没有 group=1
参数,则无法正确计算比例。
这是我的函数:
def plot_churn(df_):
color_dict = {
'Stayed': 'green',
'Churned': 'red'
}
myplot = ggplot(data=df_, mapping=aes(x='Flag_Churned', fill='Flag_Churned'))
myplot += geom_bar(mapping=aes(y="stat(prop)", group=1))
myplot += theme(subplots_adjust={'right': 0.71})
myplot += facet_wrap('Flag_Treat')
myplot += scale_fill_manual(color_dict)
myplot += scale_y_continuous(labels=percent_format())
print(myplot)
例如,当使用以下 pandas DataFrame 时:
data = {'Churn': [0,0,0,1,1,0,1,1], 'Flag_Treat': ['treated','treated','treated','treated','not treated','not treated','not treated','not treated'],
'Flag_Churned': ['Stayed', 'Stayed', 'Stayed', 'Churned', 'Churned', 'Stayed', 'Churned', 'Churned']}
df = pd.DataFrame(data=data)
结果输出未被'Flag_Churned'填充:
我做错了什么?
问题是 stat(prop)
计算每个方面的道具。虽然设置 group
美学将为您提供正确的道具,但它会覆盖 fill
的分组。有 R 背景,我知道如何在 R 中即时进行此计算。但是,R 中建议的更简单的方法和大多数时间是在将数据传递给 ggplot
之前聚合数据并利用 geom_col
而不是 geom_bar
:
from mizani.formatters import percent_format
from plotnine import *
import pandas as pd
import numpy as np
data = {'Churn': [0,0,0,1,1,0,1,1], 'Flag_Treat': ['treated','treated','treated','treated','not treated','not treated','not treated','not treated'],
'Flag_Churned': ['Stayed', 'Stayed', 'Stayed', 'Churned', 'Churned', 'Stayed', 'Churned', 'Churned']}
df = pd.DataFrame(data=data)
df_.group_by(['Flag_Churned', 'Flag_Treat']).agg(len)
color_dict = {
'Stayed': 'green',
'Churned': 'red'
}
def plot_churn(df_):
color_dict = {
'Stayed': 'green',
'Churned': 'red'
}
df_ = df_.groupby(['Flag_Churned', 'Flag_Treat']).agg(len)
df_ = df_.groupby(level=0).apply(lambda x: x / float(x.sum())).reset_index()
myplot = ggplot(data=df_, mapping=aes(x='Flag_Churned', y='Churn', fill='Flag_Churned'))
myplot += geom_col()
myplot += theme(subplots_adjust={'right': 0.71})
myplot += facet_wrap('Flag_Treat')
myplot += scale_fill_manual(color_dict)
myplot += scale_y_continuous(labels=percent_format())
print(myplot)
plot_churn(df)
我正在使用 plotnine 绘制一些图。当我尝试显示比例条形图而不是计数时,fill
参数变得无用。我注意到删除 group=1
参数有助于使 fill
参数再次“激活”。但是,如果没有 group=1
参数,则无法正确计算比例。
这是我的函数:
def plot_churn(df_):
color_dict = {
'Stayed': 'green',
'Churned': 'red'
}
myplot = ggplot(data=df_, mapping=aes(x='Flag_Churned', fill='Flag_Churned'))
myplot += geom_bar(mapping=aes(y="stat(prop)", group=1))
myplot += theme(subplots_adjust={'right': 0.71})
myplot += facet_wrap('Flag_Treat')
myplot += scale_fill_manual(color_dict)
myplot += scale_y_continuous(labels=percent_format())
print(myplot)
例如,当使用以下 pandas DataFrame 时:
data = {'Churn': [0,0,0,1,1,0,1,1], 'Flag_Treat': ['treated','treated','treated','treated','not treated','not treated','not treated','not treated'],
'Flag_Churned': ['Stayed', 'Stayed', 'Stayed', 'Churned', 'Churned', 'Stayed', 'Churned', 'Churned']}
df = pd.DataFrame(data=data)
结果输出未被'Flag_Churned'填充:
我做错了什么?
问题是 stat(prop)
计算每个方面的道具。虽然设置 group
美学将为您提供正确的道具,但它会覆盖 fill
的分组。有 R 背景,我知道如何在 R 中即时进行此计算。但是,R 中建议的更简单的方法和大多数时间是在将数据传递给 ggplot
之前聚合数据并利用 geom_col
而不是 geom_bar
:
from mizani.formatters import percent_format
from plotnine import *
import pandas as pd
import numpy as np
data = {'Churn': [0,0,0,1,1,0,1,1], 'Flag_Treat': ['treated','treated','treated','treated','not treated','not treated','not treated','not treated'],
'Flag_Churned': ['Stayed', 'Stayed', 'Stayed', 'Churned', 'Churned', 'Stayed', 'Churned', 'Churned']}
df = pd.DataFrame(data=data)
df_.group_by(['Flag_Churned', 'Flag_Treat']).agg(len)
color_dict = {
'Stayed': 'green',
'Churned': 'red'
}
def plot_churn(df_):
color_dict = {
'Stayed': 'green',
'Churned': 'red'
}
df_ = df_.groupby(['Flag_Churned', 'Flag_Treat']).agg(len)
df_ = df_.groupby(level=0).apply(lambda x: x / float(x.sum())).reset_index()
myplot = ggplot(data=df_, mapping=aes(x='Flag_Churned', y='Churn', fill='Flag_Churned'))
myplot += geom_col()
myplot += theme(subplots_adjust={'right': 0.71})
myplot += facet_wrap('Flag_Treat')
myplot += scale_fill_manual(color_dict)
myplot += scale_y_continuous(labels=percent_format())
print(myplot)
plot_churn(df)