如何按 pandas 中的百分比汇总数据

How to summarise data by percentages in pandas

此代码:

  #Missing analysis for actions - which action is missing the most action_types?
    grouped_missing_analysis = pd.crosstab(clean_sessions.action_type, clean_sessions.action, margins=True).unstack()
    grouped_unknown = grouped_missing_analysis.loc(axis=0)[slice(None), ['Missing', 'Unknown', 'Other']]
    print(grouped_unknown)

导致打印这个:

action                        action_type
10                            Missing              0
                              Unknown              0
11                            Missing              0
                              Unknown              0
12                            Missing              0
                              Unknown              0
15                            Missing              0
                              Unknown              0
about_us                      Missing              0
                              Unknown            416
accept_decline                Missing              0
                              Unknown              0
account                       Missing              0
                              Unknown           9040
acculynk_bin_check_failed     Missing              0
                              Unknown              1
acculynk_bin_check_success    Missing              0
                              Unknown             51
acculynk_load_pin_pad         Missing              0
                              Unknown             50

我现在如何将每个操作的总 MissingUnknownOther 汇总为每个操作的总价值计数,并以 [=20 的百分比表示=] action_types 是 MissingUnknown 还是 Other?因此,例如,每个操作都会有一行,而 about_us 行将有 406+0/Total Missing + Unknown + Other 用于所有操作。

有关上下文,请参阅

问题是上面的内容在它的底部包含一行,叫做 All,它是所有内容的总和,所以:

All                           Missing        1126204
                              Unknown        1031170

期望的输出是:

action                        percent_total_missing_action_type
10                            0
11                            0
12                            0
15                            0
about_us                     416/total_missing_action_type (in the All row - 2157374, or the sum of everything in the action_type column)
accept_decline                0
account                       9040/total_missing_action_type (in the All row - 2157374, or the sum of everything in the action_type column)
acculynk_bin_check_failed     1/total_missing_action_type (in the All row - 2157374, or the sum of everything in the action_type column)
etc..

这里是一些测试数据:

action                        action_type
    a                            Missing              2
                                 Unknown              5
    b                            Missing              3
                                 Unknown              4
    c                            Missing              5
                                 Unknown              6
    d                            Missing              1
                                 Unknown              9
    All                          Missing             11
                                 Unknown             24

这应该包括哪些内容:

     action                        action_type_percentage
    a                            Missing              2/11
                                 Unknown              5/24
    b                            Missing              3/11
                                 Unknown              4/24
    c                            Missing              5/11
                                 Unknown              6/24
    d                            Missing              1/11
                                 Unknown              9/24
    All                          Missing             11/11
                                 Unknown             24/24

首先,您可以通过 xs and then you can try it by original Series. Last you can reset_index:

使用键 All 找到 Multindex 的值
print df
action  action_type
a       Missing         2
        Unknown         5
b       Missing         3
        Unknown         4
c       Missing         5
        Unknown         6
d       Missing         1
        Unknown         9
All     Missing        11
        Unknown        24
dtype: int64

print df.xs('All')
Missing    11
Unknown    24
dtype: int64
action  action_type

print df / df.xs('All')
action  action_type
a       Missing        0.181818
        Unknown        0.208333
b       Missing        0.272727
        Unknown        0.166667
c       Missing        0.454545
        Unknown        0.250000
d       Missing        0.090909
        Unknown        0.375000
All     Missing        1.000000
        Unknown        1.000000
dtype: float64
print (df / df.xs('All')).reset_index().rename(columns={0:'action_type_percentage'})
  action action_type  action_type_percentage
0      a     Missing                0.181818
1      a     Unknown                0.208333
2      b     Missing                0.272727
3      b     Unknown                0.166667
4      c     Missing                0.454545
5      c     Unknown                0.250000
6      d     Missing                0.090909
7      d     Unknown                0.375000
8    All     Missing                1.000000
9    All     Unknown                1.000000