Python pandas : pivot_table 简单的字符串聚合和排序
Python pandas : pivot_table simple string aggregation and sort
我正在尝试使用 pandas 实现某些功能,这在 Excel 数据透视表中非常简单:
据我所见,以下代码似乎符合逻辑,但它不起作用。最重要的是,我想知道实现这样一个简单的聚合会有多复杂。有什么建议吗?
pt = pd.pivot_table(data=df,
aggfunc = 'count',
index = ["root_name", "rca"],
values = ["rca"],
margins = True).sort_values(['rca'],
ascending=[False])
编辑:示例输入数据和输出
try: from io import StringIO # Python 3
except: from StringIO import StringIO # Python 2
import pandas as pd
TESTDATA = u"""root_name;rca
Mobile Voice;mib manual manipulation
Mobile Voice;mib manual manipulation
Internet;dq
Mobile Voice;defect
Internet;mnp
Mobile Voice;mnp
Mobile Voice;defect
Mobile Voice;ceased in mib before dqt run
Mobile Voice;mnp
Mobile Voice;ceased in mib before dqt run
Internet;dq
Mobile Voice;mnp
Mobile Voice;dq
Mobile Voice;no dq
Mobile Voice;no dq
Mobile Voice;asset ceased while order was pending
Internet;dq
Mobile Voice;no dq
Internet;mnp
Mobile Voice;mnp
Mobile Voice;salto replication delay
Mobile Voice;provide order created dq
Internet;mnp
Mobile Voice;mib manual manipulation
Mobile Voice;mnp
Mobile Voice;mnp
Mobile Voice;ceased in mib before dqt run
Mobile Voice;mnp
Mobile Voice;mib manual manipulation
"""
df = pd.read_csv(StringIO(TESTDATA), sep=';', usecols= ['root_name', 'rca'], engine='python')
pt = pd.pivot_table(data=df,
aggfunc = 'count',
index = ["root_name", "rca"],
values = ["rca"],
margins = True)
print (pt.sort_values(['rca'],
ascending=[False]))
结果:
空数据框
列: []
Index: [(Mobile Voice, salto replication delay), (Mobile Voice, provide order created dq), (Mobile Voice, no dq), (Internet, mnp), (Mobile Voice, mnp), (Mobile Voice, mib manual manipulation ), (Internet, dq), (Mobile Voice, dq), (Mobile Voice, defect), (Mobile Voice, stopped in mib before dqt 运行), (Mobile Voice, asset stopped while order pending), (全部, )]
尝试将 'count' 字段添加到您的数据框中,然后使用 count() 方法进行分组:
df['count'] = 1
df.groupby(by=['root_name', 'rca']).count().sort_index('rca')
输出:
count
root_name rca
Internet dq 3
mnp 3
Mobile Voice asset ceased while order was pending 1
ceased in mib before dqt run 3
defect 2
dq 1
mib manual manipulation 4
mnp 7
no dq 3
provide order created dq 1
salto replication delay 1
通过添加“root_name”和字段来重现 excel 结果的小调整:
grouped_sum = df.groupby(by='root_name').sum().reset_index(level=[0])
grouped = df.merge(grouped_sum, how='left', on='root_name')
grouped.rename(columns={'count_x': 'count', 'count_y': 'sum'}, inplace=True)
grouped
root_name rca count sum
0 Mobile Voice mib manual manipulation 1 23
1 Mobile Voice mib manual manipulation 1 23
2 Internet dq 1 6
pd.pivot_table(
data=grouped,
aggfunc=['count'],
index=[ "root_name", "sum", "rca"],
values=["count"],
margins=True).sort_values(["sum", 'root_name', 'rca'],
ascending=[False, True, True]
)
输出:
count
count
root_name sum rca
All 29
Mobile Voice 23 asset ceased while order was pending 1
ceased in mib before dqt run 3
defect 2
dq 1
mib manual manipulation 4
mnp 7
no dq 3
provide order created dq 1
salto replication delay 1
Internet 6 dq 3
mnp 3
我正在尝试使用 pandas 实现某些功能,这在 Excel 数据透视表中非常简单:
据我所见,以下代码似乎符合逻辑,但它不起作用。最重要的是,我想知道实现这样一个简单的聚合会有多复杂。有什么建议吗?
pt = pd.pivot_table(data=df,
aggfunc = 'count',
index = ["root_name", "rca"],
values = ["rca"],
margins = True).sort_values(['rca'],
ascending=[False])
编辑:示例输入数据和输出
try: from io import StringIO # Python 3
except: from StringIO import StringIO # Python 2
import pandas as pd
TESTDATA = u"""root_name;rca
Mobile Voice;mib manual manipulation
Mobile Voice;mib manual manipulation
Internet;dq
Mobile Voice;defect
Internet;mnp
Mobile Voice;mnp
Mobile Voice;defect
Mobile Voice;ceased in mib before dqt run
Mobile Voice;mnp
Mobile Voice;ceased in mib before dqt run
Internet;dq
Mobile Voice;mnp
Mobile Voice;dq
Mobile Voice;no dq
Mobile Voice;no dq
Mobile Voice;asset ceased while order was pending
Internet;dq
Mobile Voice;no dq
Internet;mnp
Mobile Voice;mnp
Mobile Voice;salto replication delay
Mobile Voice;provide order created dq
Internet;mnp
Mobile Voice;mib manual manipulation
Mobile Voice;mnp
Mobile Voice;mnp
Mobile Voice;ceased in mib before dqt run
Mobile Voice;mnp
Mobile Voice;mib manual manipulation
"""
df = pd.read_csv(StringIO(TESTDATA), sep=';', usecols= ['root_name', 'rca'], engine='python')
pt = pd.pivot_table(data=df,
aggfunc = 'count',
index = ["root_name", "rca"],
values = ["rca"],
margins = True)
print (pt.sort_values(['rca'],
ascending=[False]))
结果: 空数据框 列: [] Index: [(Mobile Voice, salto replication delay), (Mobile Voice, provide order created dq), (Mobile Voice, no dq), (Internet, mnp), (Mobile Voice, mnp), (Mobile Voice, mib manual manipulation ), (Internet, dq), (Mobile Voice, dq), (Mobile Voice, defect), (Mobile Voice, stopped in mib before dqt 运行), (Mobile Voice, asset stopped while order pending), (全部, )]
尝试将 'count' 字段添加到您的数据框中,然后使用 count() 方法进行分组:
df['count'] = 1
df.groupby(by=['root_name', 'rca']).count().sort_index('rca')
输出:
count
root_name rca
Internet dq 3
mnp 3
Mobile Voice asset ceased while order was pending 1
ceased in mib before dqt run 3
defect 2
dq 1
mib manual manipulation 4
mnp 7
no dq 3
provide order created dq 1
salto replication delay 1
通过添加“root_name”和字段来重现 excel 结果的小调整:
grouped_sum = df.groupby(by='root_name').sum().reset_index(level=[0])
grouped = df.merge(grouped_sum, how='left', on='root_name')
grouped.rename(columns={'count_x': 'count', 'count_y': 'sum'}, inplace=True)
grouped
root_name rca count sum
0 Mobile Voice mib manual manipulation 1 23
1 Mobile Voice mib manual manipulation 1 23
2 Internet dq 1 6
pd.pivot_table(
data=grouped,
aggfunc=['count'],
index=[ "root_name", "sum", "rca"],
values=["count"],
margins=True).sort_values(["sum", 'root_name', 'rca'],
ascending=[False, True, True]
)
输出:
count
count
root_name sum rca
All 29
Mobile Voice 23 asset ceased while order was pending 1
ceased in mib before dqt run 3
defect 2
dq 1
mib manual manipulation 4
mnp 7
no dq 3
provide order created dq 1
salto replication delay 1
Internet 6 dq 3
mnp 3