将 95% 置信区间作为误差条添加到 pandas 条形图
Add 95% confidence intervals as error bars to pandas bar plot
我想将 95% 置信区间误差线添加到 pandas 条形图,例如 here。这是我的数据的样子:
ciRatings.head(20)
count mean std
condition envCond
c01 CSNoisyLvl1 40 4.875000 0.404304
CSNoisyLvl2 40 4.850000 0.361620
LabNoisyLvl1 52 4.826923 0.382005
LabNoisyLvl2 52 4.826923 0.430283
LabQuiet 92 4.826087 0.408930
c02 CSNoisyLvl1 40 2.825000 0.902631
CSNoisyLvl2 40 3.000000 0.816497
LabNoisyLvl1 52 3.250000 1.218726
LabNoisyLvl2 52 3.096154 1.089335
LabQuiet 92 2.956522 1.036828
c03 CSNoisyLvl1 40 3.750000 0.669864
CSNoisyLvl2 40 3.775000 0.659740
LabNoisyLvl1 52 4.307692 0.728643
LabNoisyLvl2 52 4.288462 0.723188
LabQuiet 92 3.967391 0.790758
c06 CSNoisyLvl1 40 4.450000 0.638508
CSNoisyLvl2 40 4.250000 0.669864
LabNoisyLvl1 52 4.692308 0.578655
LabNoisyLvl2 52 4.384615 0.599145
LabQuiet 92 4.717391 0.452735
我查看了关于如何使用错误栏的 pandas documentation,并尝试复制他们的代码示例。我想到了以下内容:
# calculate range of CI around mean (as it is symmetric)
ci95_lower = []
for i in ciRatings.index:
count, mean, std = ciRatings.loc[i]
ci95_lower.append(mean - 1.96*std/math.sqrt(count))
ciRatings['CI95_lower'] = ci95_lower
ciRatings['CI95_range'] = ciRatings['mean'] - ciRatings['CI95_lower']
# extract CI range and means
ciRange = ciRatings[['CI95_range']]
ciRange = ciRange.unstack()
ciRatings = ciRatings[['mean']]
# bar plot with CI95 as error lines
ciBarPlot = ciRatings.unstack().plot(kind='bar', yerr=ciRange, capsize=4)
plt.show()
然而,这导致了下图,显然没有误差线。我的错误是什么?我想我误解了我必须将 plot 函数作为 yerr 参数传递的确切内容。
编辑:使用 Quang Hoang 的回答,我按如下方式更改了代码以获得所需的置信区间条:
# calculate range of CI around mean (as it is symmetric)
ci95_lower = []
for i in ciRatings.index:
count, mean, std = ciRatings.loc[i]
ci95_lower.append(mean - 1.96*std/math.sqrt(count))
ciRatings['CI95_lower'] = ci95_lower
ciRatings['CI95_range'] = ciRatings['mean'] - ciRatings['CI95_lower']
# bar plot with CI95 lines
ciBarPlot = ciRatings['mean'].unstack(level=1).plot.bar(
yerr=ciRatings['CI95_range'].unstack(level=1), capsize=4)
plt.show()
给出的link建议:
fig, ax = plt.subplots(figsize=(12,8))
(df['mean'].unstack(level=1)
.plot.bar(yerr=df['std'].unstack(level=1) * 1.96,
ax=ax, capsize=4)
)
plt.show()
输出:
我想将 95% 置信区间误差线添加到 pandas 条形图,例如 here。这是我的数据的样子:
ciRatings.head(20)
count mean std
condition envCond
c01 CSNoisyLvl1 40 4.875000 0.404304
CSNoisyLvl2 40 4.850000 0.361620
LabNoisyLvl1 52 4.826923 0.382005
LabNoisyLvl2 52 4.826923 0.430283
LabQuiet 92 4.826087 0.408930
c02 CSNoisyLvl1 40 2.825000 0.902631
CSNoisyLvl2 40 3.000000 0.816497
LabNoisyLvl1 52 3.250000 1.218726
LabNoisyLvl2 52 3.096154 1.089335
LabQuiet 92 2.956522 1.036828
c03 CSNoisyLvl1 40 3.750000 0.669864
CSNoisyLvl2 40 3.775000 0.659740
LabNoisyLvl1 52 4.307692 0.728643
LabNoisyLvl2 52 4.288462 0.723188
LabQuiet 92 3.967391 0.790758
c06 CSNoisyLvl1 40 4.450000 0.638508
CSNoisyLvl2 40 4.250000 0.669864
LabNoisyLvl1 52 4.692308 0.578655
LabNoisyLvl2 52 4.384615 0.599145
LabQuiet 92 4.717391 0.452735
我查看了关于如何使用错误栏的 pandas documentation,并尝试复制他们的代码示例。我想到了以下内容:
# calculate range of CI around mean (as it is symmetric)
ci95_lower = []
for i in ciRatings.index:
count, mean, std = ciRatings.loc[i]
ci95_lower.append(mean - 1.96*std/math.sqrt(count))
ciRatings['CI95_lower'] = ci95_lower
ciRatings['CI95_range'] = ciRatings['mean'] - ciRatings['CI95_lower']
# extract CI range and means
ciRange = ciRatings[['CI95_range']]
ciRange = ciRange.unstack()
ciRatings = ciRatings[['mean']]
# bar plot with CI95 as error lines
ciBarPlot = ciRatings.unstack().plot(kind='bar', yerr=ciRange, capsize=4)
plt.show()
然而,这导致了下图,显然没有误差线。我的错误是什么?我想我误解了我必须将 plot 函数作为 yerr 参数传递的确切内容。
编辑:使用 Quang Hoang 的回答,我按如下方式更改了代码以获得所需的置信区间条:
# calculate range of CI around mean (as it is symmetric)
ci95_lower = []
for i in ciRatings.index:
count, mean, std = ciRatings.loc[i]
ci95_lower.append(mean - 1.96*std/math.sqrt(count))
ciRatings['CI95_lower'] = ci95_lower
ciRatings['CI95_range'] = ciRatings['mean'] - ciRatings['CI95_lower']
# bar plot with CI95 lines
ciBarPlot = ciRatings['mean'].unstack(level=1).plot.bar(
yerr=ciRatings['CI95_range'].unstack(level=1), capsize=4)
plt.show()
给出的link建议:
fig, ax = plt.subplots(figsize=(12,8))
(df['mean'].unstack(level=1)
.plot.bar(yerr=df['std'].unstack(level=1) * 1.96,
ax=ax, capsize=4)
)
plt.show()
输出: