在同一图上绘制多条密度曲线：对 Python 中的子集类别进行加权 3

Question

我正在尝试在 python 3 中重新创建此密度图：math.stackexchange。com/questions/845424/the-expected-outcome-of-a-random-game-of-chess

End Goal: I need my density plot to look like this

蓝色曲线下的面积等于红色、绿色和紫色曲线的总和，因为不同的结果（平、黑胜和白胜）是总计 (All) 的子集。

我如何python实现并相应地绘制它？

这是 1000 次模拟 pastebin 后 results_df 的 .csv 文件。com/YDVMx2DL

from matplotlib import pyplot as plt
import seaborn as sns

black = results_df.loc[results_df['outcome'] == 'Black']
white = results_df.loc[results_df['outcome'] == 'White']
draw = results_df.loc[results_df['outcome'] == 'Draw']
win = results_df.loc[results_df['outcome'] != 'Draw']

Total = len(results_df.index)
Wins = len(win.index)

PercentBlack = "Black Wins ≈ %s" %('{0:.2%}'.format(len(black.index)/Total))
PercentWhite = "White Wins ≈ %s" %('{0:.2%}'.format(len(white.index)/Total))
PercentDraw = "Draw ≈ %s" %('{0:.2%}'.format(len(draw.index)/Total))
AllTitle = 'Distribution of Moves by All Outcomes (nSample = %s)' %(workers)

sns.distplot(results_df.moves, hist=False, label = "All")
sns.distplot(black.moves, hist=False, label=PercentBlack)
sns.distplot(white.moves, hist=False, label=PercentWhite)
sns.distplot(draw.moves, hist=False, label=PercentDraw)
plt.title(AllTitle)
plt.ylabel('Density')
plt.xlabel('Number of Moves')
plt.legend()
plt.show()

上面的代码生成没有权重的密度曲线，我真的需要弄清楚如何相应地生成密度曲线权重以及如何在图例中保留我的标签

density curves, no weights; help

我还尝试了频率直方图，它正确地缩放了分布高度，但我宁愿让 4 条曲线相互重叠以获得 "cleaner" 外观...我不我不喜欢这个频率图，但这是我目前的解决方案。

results_df.moves.hist(alpha=0.4, bins=range(0, 700, 10), label = "All")
draw.moves.hist(alpha=0.4, bins=range(0, 700, 10), label = PercentDraw)
white.moves.hist(alpha=0.4, bins=range(0, 700, 10), label = PercentWhite)
black.moves.hist(alpha=0.4, bins=range(0, 700, 10), label = PercentBlack)
plt.title(AllTitle)
plt.ylabel('Frequency')
plt.xlabel('Number of Moves')
plt.legend()
plt.show()

如果有人可以编写 python 3 代码， 输出第一个图，其中包含 4 条密度曲线和正确的子集权重，并保留显示百分比的自定义图例 , 那将不胜感激。

一旦用正确的子集权重绘制了密度曲线，我也对python 3 中的代码感兴趣找到每条密度曲线的最大点坐标一旦我将其扩展到 500,000 次迭代，它就会显示最大移动频率。

谢谢

Answer 1

你要小心。您制作的情节是正确的。显示的所有曲线都是基础分布的概率密度函数。

在你想要的图中，只有标有"All"的曲线是概率密度函数。其他曲线不是。

在任何情况下，如果您想像所需的图中所示那样缩放核密度估计值，您都需要自己计算核密度估计值。这可以使用 scipy.stats.gaussial_kde() 来完成。

为了重现想要的情节，我看到了两个选项。

计算所有涉及案例的 kde，并根据样本数量对其进行缩放。

import numpy as np; np.random.seed(0)
import matplotlib.pyplot as plt
import scipy.stats

a = np.random.gumbel(80, 25, 1000).astype(int)
b = np.random.gumbel(200, 46, 4000).astype(int)

kdea = scipy.stats.gaussian_kde(a)
kdeb = scipy.stats.gaussian_kde(b)

both = np.hstack((a,b))
kdeboth = scipy.stats.gaussian_kde(both)
grid = np.arange(500)

#weighted kde curves
wa = kdea(grid)*(len(a)/float(len(both)))
wb = kdeb(grid)*(len(b)/float(len(both)))

print "a.sum ", wa.sum()
print "b.sum ", wb.sum()
print "total.sum ", kdeb(grid).sum()

fig, ax = plt.subplots()
ax.plot(grid, wa, lw=1, label = "weighted a")
ax.plot(grid, wb, lw=1, label = "weighted b")
ax.plot(grid, kdeboth(grid), color="crimson", lw=2, label = "pdf")

plt.legend()
plt.show()

计算所有个案的kde，归一化它们的总和以获得总数。

import numpy as np; np.random.seed(0)
import matplotlib.pyplot as plt
import scipy.stats

a = np.random.gumbel(80, 25, 1000).astype(int)
b = np.random.gumbel(200, 46, 4000).astype(int)

kdea = scipy.stats.gaussian_kde(a)
kdeb = scipy.stats.gaussian_kde(b)

grid = np.arange(500)


#weighted kde curves
wa = kdea(grid)*(len(a)/float(len(a)+len(b)))
wb = kdeb(grid)*(len(b)/float(len(a)+len(b)))

total = wa+wb

fig, ax = plt.subplots(figsize=(5,3))
ax.plot(grid, wa, lw=1, label = "weighted a")
ax.plot(grid, wb, lw=1, label = "weighted b")
ax.plot(grid, total, color="crimson", lw=2, label = "pdf")

plt.legend()
plt.show()

在同一图上绘制多条密度曲线：对 Python 中的子集类别进行加权 3

Plotting multiple density curves on the same plot: weighting the subset categories in Python 3

plot

matplotlib

python-3.x

density-plot

seaborn