如何绘制比较不同数据帧子集的分布图

Question

我有两个 DataFrame，我想比较每个 DataFrame 中特定项目的分布/密度。

例如，如果我想比较 'Amy' 和 'Andrew'：

df1
        | Count |
ID     
Amy     |   5   |
Chris   |   4   |
Gabe    |   2   |

df2
        | Count |
ID     
Andrew  |   2   |
Chloe   |   3   |
Georgia |   1   |

我已经计算出以下内容，但我现在不确定如何在图表中绘制它：

Amy_dist = df1.loc['Amy'] / df1.sum(axis=1)
Andrew_dist = df2.loc['Andrew'] / df2.sum(axis=1)

其中：Amy_dist = 45.5% 和 Andrew_dist = 33.3%

我不知道如何绘制比较这两个数字的两个条形图。如有任何建议，我们将不胜感激！

Answer 1

试试这样的条形图？

plt.bar(['Amy','Andrew'], [Amy_dist, Andrew_dist])

Answer 2

合并两组数据会更直接，前提是合并的 'ID' 名称是唯一的，或者在合并数据帧之前创建一个新的唯一索引。
这样，就不需要对特定人员进行硬编码，例如 Amy_dist = df1.loc['Amy'] / df1.sum(axis=1)。
在组合数据框中，使用.groupby获取每组'Count'列的sum，然后divide每个'Count'组的总和，得到百分比。

import pandas as pd

# test data
df1 = pd.DataFrame({'Count': [5, 4, 2], 'ID': ['Amy', 'Chris', 'Gabe']}).set_index('ID')
df2 = pd.DataFrame({'Count': [2, 3, 1], 'ID': ['Andrew', 'Chloe', 'Georgia']}).set_index('ID')

# create a new column in each dataframe to identify where the data is from
df1['from'] = 1
df2['from'] = 2

# combine the dataframes
df = pd.concat([df1, df2])

# create a norm column, which is the percent of the total based on the group
df['norm'] = df.Count.div(df.groupby('from').Count.transform('sum')).mul(100).round(2)

# display(df)
         Count  from   norm
ID                         
Amy          5     1  45.45
Chris        4     1  36.36
Gabe         2     1  18.18
Andrew       2     2  33.33
Chloe        3     2  50.00
Georgia      1     2  16.67

绘制所有数据

df.plot(y='norm', kind='bar', grid=True, legend=False)

绘制特定人物

使用.loc指定个人

df.loc[['Amy', 'Andrew'], 'norm'].plot(kind='bar')

与`seaborn`

import seaborn as sns

sns.barplot(data=df.reset_index(), x='ID', y='norm', hue='from', dodge=False)

如何绘制比较不同数据帧子集的分布图

How to plot a distribution graph comparing subsets of different DataFrames

python

graph

distribution

matplotlib

seaborn

绘制所有数据

绘制特定人物

与`seaborn`

如何绘制比较不同数据帧子集的分布图

How to plot a distribution graph comparing subsets of different DataFrames

python

graph

distribution

matplotlib

seaborn

绘制所有数据

绘制特定人物

与seaborn

与`seaborn`