Groupby Sum returns 错误的总和值,因为它已乘以 Pandas
Groupby Sum returns the wrong sum value as it has been multiplied in Pandas
这是一个示例代码:
import pandas as pd
data = {'Date': ['10/10/21', '10/10/21', '13/10/21', '11/10/21', '11/10/21', '11/10/21', '11/10/21', '11/10/21', '13/10/21', '13/10/21', '13/10/21', '10/10/21', '10/10/21'],
'ID': [1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
'TotalTimeSpentInMinutes': [19, 6, 14, 17, 51, 53, 66, 19, 14, 28, 44, 22, 41],
'Vehicle': ['V3', 'V1', 'V3', 'V1','V1','V1','V1','V1','V1','V1','V1','V1','V1']
}
df = pd.DataFrame(data)
prices = {
'V1': 9.99,
'V2': 9.99,
'V3': 14.00,
}
default_price = 9.99
df = df.sort_values('ID')
df['OrdersPD'] = df.groupby(['ID', 'Date', 'Vehicle'])['ID'].transform('count')
df['MinutesPD'] = df.groupby(['ID', 'Date', 'Vehicle'])['TotalTimeSpentInMinutes'].transform(sum)
df['HoursPD'] = df['MinutesPD'] / 60
df['Pay excl extra'] = df.apply(lambda x: prices[x.get('Vehicle', default_price)]*x['HoursPD'], axis=1).round(2)
extra = 1.20
df['Extra Pay'] = df.apply(lambda x: extra*x['OrdersPD'], axis=1)
df['Total_pay'] = df['Pay excl extra'] + df['Extra Pay'].round(2)
df['Total Pay PD'] = df.groupby(['ID'])['Total_pay'].transform(sum)
#Returns wrong sum
df['Total Courier Hours'] = df.groupby(['ID'])['HoursPD'].transform(sum)
#Returns wrong sum
df['ABS Final Pay'] = df.groupby(['ID'])['Total Pay PD'].transform(sum)
#Returns wrong sum
df.drop_duplicates((['ID','Date','Vehicle']), inplace=True)
print(df)
我正在尝试查找 2 项每个 ID 的总和:工时和工资。
这是我的代码,用于查找总小时数并支付
小时数:
df['Total Courier Hours'] = df.groupby(['ID'])['HoursPD'].transform(sum)
#I've also tried with just .sum() but it returns an empty column
支付:
df['ABS Final Pay'] = df.groupby(['ID'])['Total Pay PD'].transform(sum)
ID 1 的输出示例: - ABS Final Pay
Date ID Vehicle OrdersPD HoursPD PayExclExtra ExtraPay
10/10/21 1 V1 1 0.1 1 1.20
10/10/21 1 V3 1 0.3166 4.43 1.20
13/10/21 1 V3 1 0.2333 3.27 1.20
Total_pay Total Pay PD Total Courier Hours ABS Final Pay
2.20 12.30 0.65 36.90
5.63 12.30 0.65 36.90
4.47 12.30 0.65 36.90
2 列 Total Courier Hours 和 ABS Final Pay 是错误的,因为现在代码通过这样做计算总数:
ABS Final Pay = Total Pay PD * OrdersPD per count of ID
Example: for 10/10/21 - it does 12.30 * 2 = 24.60
for 13/10/21 - it does 12.30 * 1 = 12.30
ABS Final Pay returns 36.90
应该是 12.30 (7.83 + 4.47 from the 2 days)
ID 1 的总薪酬 PD 也是错误的,因为它应该显示每个日期的薪酬总和,预期输出示例:
Date ID Vehicle OrdersPD Total PD
10/10/21 1 V1 1 7.83
10/10/21 1 V3 1 7.83
13/10/21 1 V1 1 4.47
当 ID 1 分成 3 行,每行 1 个订单时,总快递时间似乎没问题,但当它有超过 1 个订单时,它在乘以它时计算错误。
ID 2 示例 - 快递总时数
它计算它做这个总和:
Total Courier Hours = HoursPD * OrdersPD per count of ID
Example: 11/10/21 - ID 2 had 5 orders, 2.85 * 5 = 14.25
13/10/21 - 3 orders, 2.01 * 3 = 6.03
10/10/21 - 2 orders, 1.05 * 2 = 2.1
快递总时数 returns 22.38
应该是 5.91 (2.85 + 2.01 + 1.05 from the 3 days)
抱歉这么久 post,我希望这是有道理的,并提前致谢。
drop_duplicates 行可能是问题所在。一旦我删除了代码:
df.drop_duplicates((['ID','Date','Vehicle']), inplace=True)
我能够更准确地逐行计算总数,而不必在代码中对列进行计算。
为了整齐地分开,我在不同的 excel sheet.
中按 groupby 打印了列
示例:
per_courier = (
df.groupby(['ID'])['Total Pay']
.agg(sum)
)
这是一个示例代码:
import pandas as pd
data = {'Date': ['10/10/21', '10/10/21', '13/10/21', '11/10/21', '11/10/21', '11/10/21', '11/10/21', '11/10/21', '13/10/21', '13/10/21', '13/10/21', '10/10/21', '10/10/21'],
'ID': [1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
'TotalTimeSpentInMinutes': [19, 6, 14, 17, 51, 53, 66, 19, 14, 28, 44, 22, 41],
'Vehicle': ['V3', 'V1', 'V3', 'V1','V1','V1','V1','V1','V1','V1','V1','V1','V1']
}
df = pd.DataFrame(data)
prices = {
'V1': 9.99,
'V2': 9.99,
'V3': 14.00,
}
default_price = 9.99
df = df.sort_values('ID')
df['OrdersPD'] = df.groupby(['ID', 'Date', 'Vehicle'])['ID'].transform('count')
df['MinutesPD'] = df.groupby(['ID', 'Date', 'Vehicle'])['TotalTimeSpentInMinutes'].transform(sum)
df['HoursPD'] = df['MinutesPD'] / 60
df['Pay excl extra'] = df.apply(lambda x: prices[x.get('Vehicle', default_price)]*x['HoursPD'], axis=1).round(2)
extra = 1.20
df['Extra Pay'] = df.apply(lambda x: extra*x['OrdersPD'], axis=1)
df['Total_pay'] = df['Pay excl extra'] + df['Extra Pay'].round(2)
df['Total Pay PD'] = df.groupby(['ID'])['Total_pay'].transform(sum)
#Returns wrong sum
df['Total Courier Hours'] = df.groupby(['ID'])['HoursPD'].transform(sum)
#Returns wrong sum
df['ABS Final Pay'] = df.groupby(['ID'])['Total Pay PD'].transform(sum)
#Returns wrong sum
df.drop_duplicates((['ID','Date','Vehicle']), inplace=True)
print(df)
我正在尝试查找 2 项每个 ID 的总和:工时和工资。
这是我的代码,用于查找总小时数并支付
小时数:
df['Total Courier Hours'] = df.groupby(['ID'])['HoursPD'].transform(sum)
#I've also tried with just .sum() but it returns an empty column
支付:
df['ABS Final Pay'] = df.groupby(['ID'])['Total Pay PD'].transform(sum)
ID 1 的输出示例: - ABS Final Pay
Date ID Vehicle OrdersPD HoursPD PayExclExtra ExtraPay
10/10/21 1 V1 1 0.1 1 1.20
10/10/21 1 V3 1 0.3166 4.43 1.20
13/10/21 1 V3 1 0.2333 3.27 1.20
Total_pay Total Pay PD Total Courier Hours ABS Final Pay
2.20 12.30 0.65 36.90
5.63 12.30 0.65 36.90
4.47 12.30 0.65 36.90
2 列 Total Courier Hours 和 ABS Final Pay 是错误的,因为现在代码通过这样做计算总数:
ABS Final Pay = Total Pay PD * OrdersPD per count of ID
Example: for 10/10/21 - it does 12.30 * 2 = 24.60
for 13/10/21 - it does 12.30 * 1 = 12.30
ABS Final Pay returns 36.90
应该是 12.30 (7.83 + 4.47 from the 2 days)
ID 1 的总薪酬 PD 也是错误的,因为它应该显示每个日期的薪酬总和,预期输出示例:
Date ID Vehicle OrdersPD Total PD
10/10/21 1 V1 1 7.83
10/10/21 1 V3 1 7.83
13/10/21 1 V1 1 4.47
当 ID 1 分成 3 行,每行 1 个订单时,总快递时间似乎没问题,但当它有超过 1 个订单时,它在乘以它时计算错误。
ID 2 示例 - 快递总时数
它计算它做这个总和:
Total Courier Hours = HoursPD * OrdersPD per count of ID
Example: 11/10/21 - ID 2 had 5 orders, 2.85 * 5 = 14.25
13/10/21 - 3 orders, 2.01 * 3 = 6.03
10/10/21 - 2 orders, 1.05 * 2 = 2.1
快递总时数 returns 22.38
应该是 5.91 (2.85 + 2.01 + 1.05 from the 3 days)
抱歉这么久 post,我希望这是有道理的,并提前致谢。
drop_duplicates 行可能是问题所在。一旦我删除了代码:
df.drop_duplicates((['ID','Date','Vehicle']), inplace=True)
我能够更准确地逐行计算总数,而不必在代码中对列进行计算。
为了整齐地分开,我在不同的 excel sheet.
中按 groupby 打印了列示例:
per_courier = (
df.groupby(['ID'])['Total Pay']
.agg(sum)
)