MSE 函数在 Pandas 数据帧中返回 NaN
MSE function returning NaNs in Pandas Dataframe
我有一个包含以下列的数据框。此处示例:
df = pd.DataFrame({'product_id' : [20,20,20,20,20,22,22,22,22,22], 'date' : ['2020-06','2020-07','2020-08','2020-09',
'2020-10','2020-06','2020-07','2020-08','2020-09',
'2020-10'],'real': [1.2,3,4,5,1,1.5,2.9,5,6,1], 'pred': [1.3,4,4,5.1,1.2,1.5,3,6,5,1.5]})
我想计算均方误差:
for game_id in df['product_id'].unique():
pred_g = df.query(f"product_id == '{game_id}'")
print(game_id, " MAE = ", mse(pred_g["real"], pred_g["pred"]))
我直接创建了一个mse函数:
def mse(actual, predicted):
actual = np.array(actual)
predicted = np.array(predicted)
differences = np.subtract(actual, predicted)
squared_differences = np.square(differences)
return squared_differences.mean()
并且它只返回每个 product_id:
的 NaN 值
如果我尝试使用 Sklearn 函数计算它,则会出现以下错误:
ValueError: Found array with 0 sample(s) (shape=(0,)) while a minimum of 1 is required.
我已经检查了 x 和 y 变量,它们的形状相同且不为空。
这会发生什么?我很困惑。
IIUC,你要使用 GroupBy.var
:
df['real'].sub(df['pred']).groupby(df['product_id']).var(ddof=0)
输出:
product_id
20 0.1336
22 0.4376
dtype: float64
人工计算:
s = df['real'].sub(df['pred'])
s.groupby(df['product_id']).apply(lambda x: x.sub(x.mean()).pow(2).mean())
我试过你的代码,没有收到任何错误。
import pandas as pd
import numpy as np
df = pd.DataFrame({'product_id' : [20,20,20,20,20,22,22,22,22,22], 'date' : ['2020-06','2020-07','2020-08','2020-09',
'2020-10','2020-06','2020-07','2020-08','2020-09',
'2020-10'],'real': [1.2,3,4,5,1,1.5,2.9,5,6,1], 'pred': [1.3,4,4,5.1,1.2,1.5,3,6,5,1.5]})
def mse(actual, predicted):
actual = np.array(actual)
predicted = np.array(predicted)
differences = np.subtract(actual, predicted)
squared_differences = np.square(differences)
return squared_differences.mean()
for game_id in df['product_id'].unique():
pred_g = df.query(f"product_id == '{game_id}'")
print(game_id, " MAE = ", mse(pred_g["real"], pred_g["pred"]))
输出:
20 MAE = 0.21200000000000002
22 MAE = 0.45199999999999996
每组使用 sklearn.metrics.mean_squared_error
:
from sklearn.metrics import mean_squared_error
s = (df.groupby('product_id')
.apply(lambda x: mean_squared_error(x['real'], x['pred'], squared=False)))
print (s)
product_id
20 0.460435
22 0.672309
dtype: float64
或手动计数:
s = df['pred'].sub(df['real']).pow(2).groupby(df['product_id']).mean().pow(0.5)
print (s)
product_id
20 0.460435
22 0.672309
dtype: float64
我有一个包含以下列的数据框。此处示例:
df = pd.DataFrame({'product_id' : [20,20,20,20,20,22,22,22,22,22], 'date' : ['2020-06','2020-07','2020-08','2020-09',
'2020-10','2020-06','2020-07','2020-08','2020-09',
'2020-10'],'real': [1.2,3,4,5,1,1.5,2.9,5,6,1], 'pred': [1.3,4,4,5.1,1.2,1.5,3,6,5,1.5]})
我想计算均方误差:
for game_id in df['product_id'].unique():
pred_g = df.query(f"product_id == '{game_id}'")
print(game_id, " MAE = ", mse(pred_g["real"], pred_g["pred"]))
我直接创建了一个mse函数:
def mse(actual, predicted):
actual = np.array(actual)
predicted = np.array(predicted)
differences = np.subtract(actual, predicted)
squared_differences = np.square(differences)
return squared_differences.mean()
并且它只返回每个 product_id:
的 NaN 值如果我尝试使用 Sklearn 函数计算它,则会出现以下错误:
ValueError: Found array with 0 sample(s) (shape=(0,)) while a minimum of 1 is required.
我已经检查了 x 和 y 变量,它们的形状相同且不为空。
这会发生什么?我很困惑。
IIUC,你要使用 GroupBy.var
:
df['real'].sub(df['pred']).groupby(df['product_id']).var(ddof=0)
输出:
product_id
20 0.1336
22 0.4376
dtype: float64
人工计算:
s = df['real'].sub(df['pred'])
s.groupby(df['product_id']).apply(lambda x: x.sub(x.mean()).pow(2).mean())
我试过你的代码,没有收到任何错误。
import pandas as pd
import numpy as np
df = pd.DataFrame({'product_id' : [20,20,20,20,20,22,22,22,22,22], 'date' : ['2020-06','2020-07','2020-08','2020-09',
'2020-10','2020-06','2020-07','2020-08','2020-09',
'2020-10'],'real': [1.2,3,4,5,1,1.5,2.9,5,6,1], 'pred': [1.3,4,4,5.1,1.2,1.5,3,6,5,1.5]})
def mse(actual, predicted):
actual = np.array(actual)
predicted = np.array(predicted)
differences = np.subtract(actual, predicted)
squared_differences = np.square(differences)
return squared_differences.mean()
for game_id in df['product_id'].unique():
pred_g = df.query(f"product_id == '{game_id}'")
print(game_id, " MAE = ", mse(pred_g["real"], pred_g["pred"]))
输出:
20 MAE = 0.21200000000000002
22 MAE = 0.45199999999999996
每组使用 sklearn.metrics.mean_squared_error
:
from sklearn.metrics import mean_squared_error
s = (df.groupby('product_id')
.apply(lambda x: mean_squared_error(x['real'], x['pred'], squared=False)))
print (s)
product_id
20 0.460435
22 0.672309
dtype: float64
或手动计数:
s = df['pred'].sub(df['real']).pow(2).groupby(df['product_id']).mean().pow(0.5)
print (s)
product_id
20 0.460435
22 0.672309
dtype: float64