python

Question

我有一个跨多个平台发布的视频游戏名称的数据框，以及它们的总销售额。它看起来像这样：

    name                        total_sales platform
0   Frozen: Olaf's Quest            0.51    DS
1   Frozen: Olaf's Quest            0.59    3DS
2   007: Quantum of Solace          0.02    PC
3   007: Quantum of Solace          0.13    DS
4   007: Quantum of Solace          0.43    PS2
5   007: Quantum of Solace          0.65    Wii
6   007: Quantum of Solace          1.15    PS3
7   007: Quantum of Solace          1.48    X360
8   007: The World is not Enough    0.92    PS
9   007: The World is not Enough    1.56    N64
10  11eyes: CrossOver               0.02    PSP
11  18 Wheeler: American Pro Truc   0.11    GC
12  18 Wheeler: American Pro Truc   0.40    PS2
13  187: Ride or Die                0.06    XB
14  187: Ride or Die                0.15    PS2
15  2 in 1 Combo Pack: Sonic Heroes 0.11    X360
16  2 in 1 Combo Pack: Sonic Heroes 0.53    XB
17  2002 FIFA World Cup             0.05    GC
18  2002 FIFA World Cup             0.19    XB
19  2002 FIFA World Cup             0.60    PS2

我正在使用以下内容来组织数据框：

df = yearly_sales.groupby(['name','total_sales']).last()
df = yearly_sales.reset_index()

然后将其绘制在 seaborn 散点图上：

sns.scatterplot(data=yearly_sales, x="total_sales", y="name")

现在，它不会按名称绘制（我猜是因为有 7400 个值）所以我想我会尝试计算平台之间的偏差：

df.groupby(['name','platform'])['total_sales'].std()

但是，这主要给了我 NaN 值，因为几乎没有游戏跨所有平台。

我不确定我的下一步应该是什么。最后，我想展示的是每个游戏的总销量在不同平台上有何不同。我什至不完全相信我是以正确的方式开始的。

任何输入将不胜感激！

提前感谢您的时间，

贾里德

Answer 1

我认为 histplot 将是一个更好的可视化问题的方法，如果 “最终，我想展示的是每个标题的总销量在不同平台上的差异” 这显示了具有标准偏差（按游戏分组）的游戏在 0.1 箱中的频率。您可以将 ddof=0 传递给非 return NaN 值，但这会改变所有值的标准偏差。

import seaborn as sns
import matplotlib.pyplot as plt
from matplotlib.ticker import MaxNLocator
plt.style.use('dark_background')
fig, ax = plt.subplots(dpi=150)
df = df[['name', 'total_sales']].groupby('name', as_index=False).std(ddof=0)
sns.histplot(data=df, x='total_sales', kde=True, bins=np.arange(0,1,0.1))
ax.yaxis.set_major_locator(MaxNLocator(integer=True))

python - 散点图问题 - 不确定如何为我想要的结果构建图？

python - scatter plot issue - not sure how to structure the plot for the results i want?

scatter-plot

standard-deviation

pandas

seaborn