Dataframe % column by grouping
Data Frame % column by groupping
我正在制作一份预测准确性报告,该报告衡量实际预测与先前预测之间的偏差。测量结果为 = 1- ('Actual' - 'M-1') / 'Actual' 。
需要根据不同的小费对措施进行分组,比如 'Product Category' / 'Line' / 'Product'。但是,df.groupby('Product Category').sum()
函数不支持百分比计算。有谁知道应该如何解决?谢谢!
data = {
"Product Category": ['Drink', 'Drink','Drink','Food','Food','Food'],
"Line": ['Water', 'Water','Wine','Fruit','Fruit','Fruit'],
"Product": ['A', 'B', 'C','D','E','F'],
"Actual": [100,50,40,20,70,50],
"M-1": [120,40,10,20,80,50],
}
df = pd.DataFrame(data)
df['M1 Gap'] = df['Actual'] - df['M-1']
df['Error_Per'] = 1- df['M1 Gap'] / df['Actual']
预期输出为
enter image description here
您应该在计算百分比之前分组:
data = {
"Product Category": ['Drink', 'Drink','Drink','Food','Food','Food'],
"Line": ['Water', 'Water','Wine','Fruit','Fruit','Fruit'],
"Product": ['A', 'B', 'C','D','E','F'],
"Actual": [100,50,40,20,70,50],
"M-1": [120,40,10,20,80,50],
}
df = pd.DataFrame(data)
df['M1 Gap'] = df['Actual'] - df['M-1']
df_line = df.groupby('Line').sum()
df_line['Error_Per'] = df_line['M1 Gap'] / df_line['Actual']
print(df_line)
df_prod = df.groupby('Product Category').sum()
df_prod['Error_Per'] = df_prod['M1 Gap'] / df_prod['Actual']
print(df_prod)
输出:
Actual M-1 M1 Gap Error_Per
Line
Fruit 140 150 -10 -0.071429
Water 150 160 -10 -0.066667
Wine 40 10 30 0.750000
Actual M-1 M1 Gap Error_Per
Product Category
Drink 190 170 20 0.105263
Food 140 150 -10 -0.071429
注意:您的屏幕截图预期结果与您的代码字典(我使用的)不匹配
您还可以创建一个自定义函数,并 apply
它在 pandas 数据框的每一行上,如下所示。请注意,我将 axis
参数设置为 1
,以便自定义函数应用于每一行或跨列:
import pandas as pd
def func(row):
row['M1 Gap'] = row['Actual'] - row['M-1']
row['Error_Per'] = 1 - (row['M1 Gap'] / row['Actual'])
return row
df.groupby('Product Category').sum().apply(func, axis = 1)
Actual M-1 M1 Gap Error_Per
Product Category
Drink 190.0 170.0 20.0 0.894737
Food 140.0 150.0 -10.0 1.071429
我正在制作一份预测准确性报告,该报告衡量实际预测与先前预测之间的偏差。测量结果为 = 1- ('Actual' - 'M-1') / 'Actual' 。
需要根据不同的小费对措施进行分组,比如 'Product Category' / 'Line' / 'Product'。但是,df.groupby('Product Category').sum()
函数不支持百分比计算。有谁知道应该如何解决?谢谢!
data = {
"Product Category": ['Drink', 'Drink','Drink','Food','Food','Food'],
"Line": ['Water', 'Water','Wine','Fruit','Fruit','Fruit'],
"Product": ['A', 'B', 'C','D','E','F'],
"Actual": [100,50,40,20,70,50],
"M-1": [120,40,10,20,80,50],
}
df = pd.DataFrame(data)
df['M1 Gap'] = df['Actual'] - df['M-1']
df['Error_Per'] = 1- df['M1 Gap'] / df['Actual']
预期输出为 enter image description here
您应该在计算百分比之前分组:
data = {
"Product Category": ['Drink', 'Drink','Drink','Food','Food','Food'],
"Line": ['Water', 'Water','Wine','Fruit','Fruit','Fruit'],
"Product": ['A', 'B', 'C','D','E','F'],
"Actual": [100,50,40,20,70,50],
"M-1": [120,40,10,20,80,50],
}
df = pd.DataFrame(data)
df['M1 Gap'] = df['Actual'] - df['M-1']
df_line = df.groupby('Line').sum()
df_line['Error_Per'] = df_line['M1 Gap'] / df_line['Actual']
print(df_line)
df_prod = df.groupby('Product Category').sum()
df_prod['Error_Per'] = df_prod['M1 Gap'] / df_prod['Actual']
print(df_prod)
输出:
Actual M-1 M1 Gap Error_Per
Line
Fruit 140 150 -10 -0.071429
Water 150 160 -10 -0.066667
Wine 40 10 30 0.750000
Actual M-1 M1 Gap Error_Per
Product Category
Drink 190 170 20 0.105263
Food 140 150 -10 -0.071429
注意:您的屏幕截图预期结果与您的代码字典(我使用的)不匹配
您还可以创建一个自定义函数,并 apply
它在 pandas 数据框的每一行上,如下所示。请注意,我将 axis
参数设置为 1
,以便自定义函数应用于每一行或跨列:
import pandas as pd
def func(row):
row['M1 Gap'] = row['Actual'] - row['M-1']
row['Error_Per'] = 1 - (row['M1 Gap'] / row['Actual'])
return row
df.groupby('Product Category').sum().apply(func, axis = 1)
Actual M-1 M1 Gap Error_Per
Product Category
Drink 190.0 170.0 20.0 0.894737
Food 140.0 150.0 -10.0 1.071429