透视包含具有不同值的重复列名称的 Pandas 数据框
Pivoting a Pandas Dataframe containing duplicate column name with different values
我有一个关于 pivot_table python pandas 的问题。
我有一个这样的数据框
Agent Detail Value
report1 General Section YESS
report1 jobID 558
report1 Priority normal
report1 Run As Owner's Credentials
report1 Schedule Section
report1 disabled TRUE
report1 timeZoneId None
report1 startImmediately FALSE
report1 repeatMinuteInterval None
report1 start date None
report1 start time None
report1 Email Recipient abc@xyz.com
report1 Email Recipient xyz@sbc.com
report2 General Section YESS
report2 jobID 559
report2 Priority normal
report2 Run As Owner's Credentials
report2 Schedule Section
report2 disabled TRUE
report2 timeZoneId None
report2 startImmediately FALSE
report2 repeatMinuteInterval None
report2 start date None
report2 start time None
report2 Email Recipient abc123@xyz.com
report2 Email Recipient xyz11123@sbc.com
我正在尝试旋转数据框并将所有详细值转换为列。索引是代理字段,它是一个报告名称。每个报告可以有多个收件人。我需要为每个报告的收件人设置每一行。示例输出如下:
[在此处输入图片描述]
我当前的代码如下:
import csv
import pandas as pd
resultsFile = 'C:\Oracle\testfile.csv' #input to transpose file
df=pd.read_csv(resultsFile,skip_blank_lines=True)
df2=df.pivot_table(index='Agent',columns='Detail',values='Value',aggfunc='sum')
df2
这是在单个字段中连接电子邮件地址,这不是我要找的东西?如何旋转具有重复列值的 df 并将它们转换为多行?
感谢您的帮助
您可以按 agent
对您的 df 进行分组并旋转组(以原始索引作为索引)。您必须填写 NaN 值并删除重复项,因为每个值一行:
reports = []
for a, sub_df in df.groupby('Agent'):
rep = sub_df.pivot(None, 'Detail', 'Value').ffill().bfill().drop_duplicates()
rep.insert(0, 'Agent', a)
reports.append(rep)
result = pd.concat(reports).reset_index()
print(result)
输出:
Detail Agent Email Recipient General Section Priority Run As ... repeatMinuteInterval start date start time startImmediately timeZoneId
0 report1 abc@xyz.com YESS normal Owner's Credentials ... None None None FALSE None
1 report1 xyz@sbc.com YESS normal Owner's Credentials ... None None None FALSE None
2 report2 abc123@xyz.com YESS normal Owner's Credentials ... None None None FALSE None
3 report2 xyz11123@sbc.com YESS normal Owner's Credentials ... None None None FALSE None
我有一个关于 pivot_table python pandas 的问题。
我有一个这样的数据框
Agent Detail Value
report1 General Section YESS
report1 jobID 558
report1 Priority normal
report1 Run As Owner's Credentials
report1 Schedule Section
report1 disabled TRUE
report1 timeZoneId None
report1 startImmediately FALSE
report1 repeatMinuteInterval None
report1 start date None
report1 start time None
report1 Email Recipient abc@xyz.com
report1 Email Recipient xyz@sbc.com
report2 General Section YESS
report2 jobID 559
report2 Priority normal
report2 Run As Owner's Credentials
report2 Schedule Section
report2 disabled TRUE
report2 timeZoneId None
report2 startImmediately FALSE
report2 repeatMinuteInterval None
report2 start date None
report2 start time None
report2 Email Recipient abc123@xyz.com
report2 Email Recipient xyz11123@sbc.com
我正在尝试旋转数据框并将所有详细值转换为列。索引是代理字段,它是一个报告名称。每个报告可以有多个收件人。我需要为每个报告的收件人设置每一行。示例输出如下:
[在此处输入图片描述]
我当前的代码如下:
import csv
import pandas as pd
resultsFile = 'C:\Oracle\testfile.csv' #input to transpose file
df=pd.read_csv(resultsFile,skip_blank_lines=True)
df2=df.pivot_table(index='Agent',columns='Detail',values='Value',aggfunc='sum')
df2
这是在单个字段中连接电子邮件地址,这不是我要找的东西?如何旋转具有重复列值的 df 并将它们转换为多行?
感谢您的帮助
您可以按 agent
对您的 df 进行分组并旋转组(以原始索引作为索引)。您必须填写 NaN 值并删除重复项,因为每个值一行:
reports = []
for a, sub_df in df.groupby('Agent'):
rep = sub_df.pivot(None, 'Detail', 'Value').ffill().bfill().drop_duplicates()
rep.insert(0, 'Agent', a)
reports.append(rep)
result = pd.concat(reports).reset_index()
print(result)
输出:
Detail Agent Email Recipient General Section Priority Run As ... repeatMinuteInterval start date start time startImmediately timeZoneId
0 report1 abc@xyz.com YESS normal Owner's Credentials ... None None None FALSE None
1 report1 xyz@sbc.com YESS normal Owner's Credentials ... None None None FALSE None
2 report2 abc123@xyz.com YESS normal Owner's Credentials ... None None None FALSE None
3 report2 xyz11123@sbc.com YESS normal Owner's Credentials ... None None None FALSE None