计算 pandas 数据框中每行的 MSE

Question

我有以下包含许多列的数据框 - 2016_x、2016_y、2017_x 等，其中 x 代表我的实际值，y 代表预测值。

我如何逐行计算均方误差 (MSE) 以查看不同水果的均方误差。这是下面的代码-

import pandas as pd
s={'Fruits':['Apple','Mango'],'2016_x':[2,3],'2017_x':[4,5],'2018_x':[12,13],'2016_y':[3,4],'2017_y':[3,4],'2018_y':[12,13]}
p=pd.DataFrame(data=s)

这是数据框的样子-

所需的输出应显示 Apple 和 Mango 的 MSE，即逐行显示。 MSE 应采用年份的 x 和 y 值之差。基本上，我分别需要 Apple 和 Mango 的总 MSE。

我知道 MSE 可以计算为-

MSE = np.mean((p['x'] - p['y'])**2, axis=1)

但是我该如何计算这种类型的数据框呢？

Answer 1

将索引设置为 Fruits 并将列转换为 (x/y, year):

的多索引

p = p.set_index('Fruits')
p.columns = p.columns.str.split('_', expand=True)
p = p.swaplevel(axis=1) 

#         x               y          
#         2016 2017 2018  2016 2017 2018
# Fruits                                
# Apple   2    4    12    3    3    12
# Mango   3    5    13    4    4    13

然后MSE算法可以向量化：

mse = p['x'].sub(p['y']).pow(2).mean(axis=1)

# Fruits
# Apple    0.666667
# Mango    0.666667
# dtype: float64

请注意，链接 sub and pow 只是在列上应用 - 和 ** 的一种更简洁的方法：

mse = ((p['x'] - p['y']) ** 2).mean(axis=1)

计算 pandas 数据框中每行的 MSE

Computing MSE per row in pandas dataframe

python

pivot

mse

dataframe

pandas