如何对 pandas 交叉表中的行求和并制作新的交叉表？

Question

我有来自 excel sheet 的数据，我在 pandas 交叉表中进行了汇总。我想通过对相关行求和来进一步对数据进行分类。

这是我的交叉表：

class_of_orbit         Elliptical  GEO  LEO  MEO  All
users
Civil                           0    0   36    0   36
Civil/Government                0    0    2    0    2
Commercial                      3   99  412    0  514
Government                      9   14   38    0   61
Government/Civil                0    0   10    0   10
Government/Commercial           0    2   81    0   83
Government/Military             0    0    1    0    1
Military                        9   67   66    0  142
Military/Civil                  0    0    2    0    2
Military/Commercial             0    0    0   32   32
All                            21  182  648   32  883

我只想要 4 个组：civil、govt、commercial 和 military。如果名称中有 "Government"，我想对包含它的所有行求和。如果名称中有 "Military"，我想将这些行汇总为军事行....

最好的方法是什么？

Answer 1

按每个名称的第一部分分组得到

df.groupby(df.class_of_orbit.str.split('/').str.get(0)).sum()

            Elliptical  GEO LEO MEO All
class_of_orbit                  
All         21         182  648 32  883
Civil       0           0   38  0   38
Commercial  3           99  412 0   514
Government  9           16  130 0   155
Military    9           67  68  32  176

Answer 2

`pd.crosstab`

从头开始

pd.crosstab(df.users.str.split('/').str[0], df.class_of_orbit)

`groupby`

在您已有的基础上。如果您将可调用对象传递给 groupby，它会将其应用于索引并使用结果进行分组。

xtab.groupby(lambda x: x.split('/')[0]).sum()

            Elliptical  GEO  LEO  MEO  All
All                 21  182  648   32  883
Civil                0    0   38    0   38
Commercial           3   99  412    0  514
Government           9   16  130    0  155
Military             9   67   68   32  176

Answer 3

喜欢 Rafael 和 piRSquared 的答案，但是如果你想对所有只有组的实例的行求和，而不仅仅是组是第一部分的行名字，你可以稍微改变 piRsquared 的答案。

您可以定义一个辅助函数来检查名称是否有第二部分，然后创建第二个数据框，其中包含名称中确实有第二部分的那些行的总和。然后将此 element-wise 与 rafael 和 piRSquared 显示的结果相加。我遗漏了 "All" 观察结果，但可以很容易地从结果数据框中计算出来。

希望没关系，我是新来的。

def second_parts_sum(x):
    if len(x.split('/')) > 1:
        return x.split('/')[1]
    else:
        return 'to_be_dropped'

first_parts = xtab.groupby(lambda x: x.split('/')[0]).sum()
second_parts = xtab.groupby(lambda x: second_parts_sum(x)).sum()
first_parts = first_parts[first_parts.index != 'All']
second_parts = second_parts[second_parts.index != 'to_be_dropped']
first_parts + second_parts



            Elliptical  GEO  LEO  MEO  All
Civil                0    0   50    0   50
Commercial           3  101  493   32  629
Government           9   16  132    0  157
Military             9   67   69   32  177

如何对 pandas 交叉表中的行求和并制作新的交叉表？

How do I sum rows inside of pandas crosstab and make a new crosstab?

python

crosstab

python-3.x

pandas

`pd.crosstab`

`groupby`