Python pandas : pivot_table 简单的字符串聚合和排序

Python pandas : pivot_table simple string aggregation and sort

我正在尝试使用 pandas 实现某些功能,这在 Excel 数据透视表中非常简单:

据我所见,以下代码似乎符合逻辑,但它不起作用。最重要的是,我想知道实现这样一个简单的聚合会有多复杂。有什么建议吗?

pt = pd.pivot_table(data=df,
           aggfunc = 'count',
           index = ["root_name", "rca"],
           values = ["rca"],
           margins = True).sort_values(['rca'], 
           ascending=[False])

编辑:示例输入数据和输出

    try: from io import StringIO  # Python 3
except: from StringIO import StringIO  # Python 2

import pandas as pd

TESTDATA = u"""root_name;rca
Mobile Voice;mib manual manipulation
Mobile Voice;mib manual manipulation
Internet;dq
Mobile Voice;defect
Internet;mnp
Mobile Voice;mnp
Mobile Voice;defect
Mobile Voice;ceased in mib before dqt run
Mobile Voice;mnp
Mobile Voice;ceased in mib before dqt run
Internet;dq
Mobile Voice;mnp
Mobile Voice;dq
Mobile Voice;no dq
Mobile Voice;no dq
Mobile Voice;asset ceased while order was pending
Internet;dq
Mobile Voice;no dq
Internet;mnp
Mobile Voice;mnp
Mobile Voice;salto replication delay
Mobile Voice;provide order created dq
Internet;mnp
Mobile Voice;mib manual manipulation
Mobile Voice;mnp
Mobile Voice;mnp
Mobile Voice;ceased in mib before dqt run
Mobile Voice;mnp
Mobile Voice;mib manual manipulation
"""

df = pd.read_csv(StringIO(TESTDATA), sep=';', usecols= ['root_name', 'rca'], engine='python')

pt = pd.pivot_table(data=df,
               aggfunc = 'count',
               index = ["root_name", "rca"],
               values = ["rca"],
               margins = True)


print (pt.sort_values(['rca'], 
               ascending=[False]))

结果: 空数据框 列: [] Index: [(Mobile Voice, salto replication delay), (Mobile Voice, provide order created dq), (Mobile Voice, no dq), (Internet, mnp), (Mobile Voice, mnp), (Mobile Voice, mib manual manipulation ), (Internet, dq), (Mobile Voice, dq), (Mobile Voice, defect), (Mobile Voice, stopped in mib before dqt 运行), (Mobile Voice, asset stopped while order pending), (全部, )]

尝试将 'count' 字段添加到您的数据框中,然后使用 count() 方法进行分组:

df['count'] = 1
df.groupby(by=['root_name', 'rca']).count().sort_index('rca')

输出:

                                                   count
root_name    rca                                        
Internet     dq                                        3
             mnp                                       3
Mobile Voice asset ceased while order was pending      1
             ceased in mib before dqt run              3
             defect                                    2
             dq                                        1
             mib manual manipulation                   4
             mnp                                       7
             no dq                                     3
             provide order created dq                  1
             salto replication delay                   1

通过添加“root_name”和字段来重现 excel 结果的小调整:

grouped_sum = df.groupby(by='root_name').sum().reset_index(level=[0])
grouped = df.merge(grouped_sum, how='left', on='root_name')
grouped.rename(columns={'count_x': 'count', 'count_y': 'sum'}, inplace=True)
grouped
       root_name                                   rca  count  sum
0   Mobile Voice               mib manual manipulation      1   23
1   Mobile Voice               mib manual manipulation      1   23
2       Internet                                    dq      1    6

pd.pivot_table(
           data=grouped,
           aggfunc=['count'],
           index=[ "root_name", "sum", "rca"],
           values=["count"],
           margins=True).sort_values(["sum", 'root_name', 'rca'], 
           ascending=[False, True, True]
)

输出:

                                                      count
                                                      count
root_name    sum rca                                       
All                                                      29
Mobile Voice 23  asset ceased while order was pending     1
                 ceased in mib before dqt run             3
                 defect                                   2
                 dq                                       1
                 mib manual manipulation                  4
                 mnp                                      7
                 no dq                                    3
                 provide order created dq                 1
                 salto replication delay                  1
Internet     6   dq                                       3
                 mnp                                      3