如何按 pandas 中的百分比汇总数据
How to summarise data by percentages in pandas
此代码:
#Missing analysis for actions - which action is missing the most action_types?
grouped_missing_analysis = pd.crosstab(clean_sessions.action_type, clean_sessions.action, margins=True).unstack()
grouped_unknown = grouped_missing_analysis.loc(axis=0)[slice(None), ['Missing', 'Unknown', 'Other']]
print(grouped_unknown)
导致打印这个:
action action_type
10 Missing 0
Unknown 0
11 Missing 0
Unknown 0
12 Missing 0
Unknown 0
15 Missing 0
Unknown 0
about_us Missing 0
Unknown 416
accept_decline Missing 0
Unknown 0
account Missing 0
Unknown 9040
acculynk_bin_check_failed Missing 0
Unknown 1
acculynk_bin_check_success Missing 0
Unknown 51
acculynk_load_pin_pad Missing 0
Unknown 50
我现在如何将每个操作的总 Missing
、Unknown
和 Other
汇总为每个操作的总价值计数,并以 [=20 的百分比表示=] action_types 是 Missing
、Unknown
还是 Other
?因此,例如,每个操作都会有一行,而 about_us
行将有 406+0/Total Missing + Unknown + Other
用于所有操作。
有关上下文,请参阅 。
问题是上面的内容在它的底部包含一行,叫做 All
,它是所有内容的总和,所以:
All Missing 1126204
Unknown 1031170
期望的输出是:
action percent_total_missing_action_type
10 0
11 0
12 0
15 0
about_us 416/total_missing_action_type (in the All row - 2157374, or the sum of everything in the action_type column)
accept_decline 0
account 9040/total_missing_action_type (in the All row - 2157374, or the sum of everything in the action_type column)
acculynk_bin_check_failed 1/total_missing_action_type (in the All row - 2157374, or the sum of everything in the action_type column)
etc..
这里是一些测试数据:
action action_type
a Missing 2
Unknown 5
b Missing 3
Unknown 4
c Missing 5
Unknown 6
d Missing 1
Unknown 9
All Missing 11
Unknown 24
这应该包括哪些内容:
action action_type_percentage
a Missing 2/11
Unknown 5/24
b Missing 3/11
Unknown 4/24
c Missing 5/11
Unknown 6/24
d Missing 1/11
Unknown 9/24
All Missing 11/11
Unknown 24/24
首先,您可以通过 xs
and then you can try it by original Series
. Last you can reset_index
:
使用键 All
找到 Multindex
的值
print df
action action_type
a Missing 2
Unknown 5
b Missing 3
Unknown 4
c Missing 5
Unknown 6
d Missing 1
Unknown 9
All Missing 11
Unknown 24
dtype: int64
print df.xs('All')
Missing 11
Unknown 24
dtype: int64
action action_type
print df / df.xs('All')
action action_type
a Missing 0.181818
Unknown 0.208333
b Missing 0.272727
Unknown 0.166667
c Missing 0.454545
Unknown 0.250000
d Missing 0.090909
Unknown 0.375000
All Missing 1.000000
Unknown 1.000000
dtype: float64
print (df / df.xs('All')).reset_index().rename(columns={0:'action_type_percentage'})
action action_type action_type_percentage
0 a Missing 0.181818
1 a Unknown 0.208333
2 b Missing 0.272727
3 b Unknown 0.166667
4 c Missing 0.454545
5 c Unknown 0.250000
6 d Missing 0.090909
7 d Unknown 0.375000
8 All Missing 1.000000
9 All Unknown 1.000000
此代码:
#Missing analysis for actions - which action is missing the most action_types?
grouped_missing_analysis = pd.crosstab(clean_sessions.action_type, clean_sessions.action, margins=True).unstack()
grouped_unknown = grouped_missing_analysis.loc(axis=0)[slice(None), ['Missing', 'Unknown', 'Other']]
print(grouped_unknown)
导致打印这个:
action action_type
10 Missing 0
Unknown 0
11 Missing 0
Unknown 0
12 Missing 0
Unknown 0
15 Missing 0
Unknown 0
about_us Missing 0
Unknown 416
accept_decline Missing 0
Unknown 0
account Missing 0
Unknown 9040
acculynk_bin_check_failed Missing 0
Unknown 1
acculynk_bin_check_success Missing 0
Unknown 51
acculynk_load_pin_pad Missing 0
Unknown 50
我现在如何将每个操作的总 Missing
、Unknown
和 Other
汇总为每个操作的总价值计数,并以 [=20 的百分比表示=] action_types 是 Missing
、Unknown
还是 Other
?因此,例如,每个操作都会有一行,而 about_us
行将有 406+0/Total Missing + Unknown + Other
用于所有操作。
有关上下文,请参阅
问题是上面的内容在它的底部包含一行,叫做 All
,它是所有内容的总和,所以:
All Missing 1126204
Unknown 1031170
期望的输出是:
action percent_total_missing_action_type
10 0
11 0
12 0
15 0
about_us 416/total_missing_action_type (in the All row - 2157374, or the sum of everything in the action_type column)
accept_decline 0
account 9040/total_missing_action_type (in the All row - 2157374, or the sum of everything in the action_type column)
acculynk_bin_check_failed 1/total_missing_action_type (in the All row - 2157374, or the sum of everything in the action_type column)
etc..
这里是一些测试数据:
action action_type
a Missing 2
Unknown 5
b Missing 3
Unknown 4
c Missing 5
Unknown 6
d Missing 1
Unknown 9
All Missing 11
Unknown 24
这应该包括哪些内容:
action action_type_percentage
a Missing 2/11
Unknown 5/24
b Missing 3/11
Unknown 4/24
c Missing 5/11
Unknown 6/24
d Missing 1/11
Unknown 9/24
All Missing 11/11
Unknown 24/24
首先,您可以通过 xs
and then you can try it by original Series
. Last you can reset_index
:
All
找到 Multindex
的值
print df
action action_type
a Missing 2
Unknown 5
b Missing 3
Unknown 4
c Missing 5
Unknown 6
d Missing 1
Unknown 9
All Missing 11
Unknown 24
dtype: int64
print df.xs('All')
Missing 11
Unknown 24
dtype: int64
action action_type
print df / df.xs('All')
action action_type
a Missing 0.181818
Unknown 0.208333
b Missing 0.272727
Unknown 0.166667
c Missing 0.454545
Unknown 0.250000
d Missing 0.090909
Unknown 0.375000
All Missing 1.000000
Unknown 1.000000
dtype: float64
print (df / df.xs('All')).reset_index().rename(columns={0:'action_type_percentage'})
action action_type action_type_percentage
0 a Missing 0.181818
1 a Unknown 0.208333
2 b Missing 0.272727
3 b Unknown 0.166667
4 c Missing 0.454545
5 c Unknown 0.250000
6 d Missing 0.090909
7 d Unknown 0.375000
8 All Missing 1.000000
9 All Unknown 1.000000