MSE 函数在 Pandas 数据帧中返回 NaN

MSE function returning NaNs in Pandas Dataframe

我有一个包含以下列的数据框。此处示例:

df = pd.DataFrame({'product_id' : [20,20,20,20,20,22,22,22,22,22], 'date' : ['2020-06','2020-07','2020-08','2020-09',
                   '2020-10','2020-06','2020-07','2020-08','2020-09',
                   '2020-10'],'real': [1.2,3,4,5,1,1.5,2.9,5,6,1], 'pred': [1.3,4,4,5.1,1.2,1.5,3,6,5,1.5]})

我想计算均方误差:

for game_id in df['product_id'].unique():
    pred_g = df.query(f"product_id == '{game_id}'")
    print(game_id, " MAE = ", mse(pred_g["real"], pred_g["pred"]))

我直接创建了一个mse函数:

def mse(actual, predicted):
    actual = np.array(actual)
    predicted = np.array(predicted)
    differences = np.subtract(actual, predicted)
    squared_differences = np.square(differences)
    return squared_differences.mean()

并且它只返回每个 product_id:

的 NaN 值

如果我尝试使用 Sklearn 函数计算它,则会出现以下错误:

ValueError: Found array with 0 sample(s) (shape=(0,)) while a minimum of 1 is required.

我已经检查了 x 和 y 变量,它们的形状相同且不为空。

这会发生什么?我很困惑。

IIUC,你要使用 GroupBy.var:

df['real'].sub(df['pred']).groupby(df['product_id']).var(ddof=0)

输出:

product_id
20    0.1336
22    0.4376
dtype: float64

人工计算:

s = df['real'].sub(df['pred'])
s.groupby(df['product_id']).apply(lambda x: x.sub(x.mean()).pow(2).mean())

我试过你的代码,没有收到任何错误。

import pandas as pd  
import numpy as np

df = pd.DataFrame({'product_id' : [20,20,20,20,20,22,22,22,22,22], 'date' : ['2020-06','2020-07','2020-08','2020-09',
                   '2020-10','2020-06','2020-07','2020-08','2020-09',
                   '2020-10'],'real': [1.2,3,4,5,1,1.5,2.9,5,6,1], 'pred': [1.3,4,4,5.1,1.2,1.5,3,6,5,1.5]})
 
def mse(actual, predicted):
    actual = np.array(actual)
    predicted = np.array(predicted)
    differences = np.subtract(actual, predicted)
    squared_differences = np.square(differences)
    return squared_differences.mean()
    
for game_id in df['product_id'].unique():
    pred_g = df.query(f"product_id == '{game_id}'")
    print(game_id, " MAE = ", mse(pred_g["real"], pred_g["pred"]))

输出:

20  MAE =  0.21200000000000002
22  MAE =  0.45199999999999996

每组使用 sklearn.metrics.mean_squared_error

from sklearn.metrics import mean_squared_error
s = (df.groupby('product_id')
      .apply(lambda x: mean_squared_error(x['real'], x['pred'], squared=False)))
    
print (s)
product_id
20    0.460435
22    0.672309
dtype: float64

或手动计数:

s = df['pred'].sub(df['real']).pow(2).groupby(df['product_id']).mean().pow(0.5)
    
print (s)
product_id
20    0.460435
22    0.672309
dtype: float64