为什么 Keras MAPE 指标在训练期间呈爆炸式增长,但 MSE 损失却没有?
Why Keras MAPE metric is exploding during training but MSE loss is not?
我在 Keras 中实施了一个 LSTM 以重现 this paper。奇怪的行为很简单:我有一个 MSE 损失函数和一个 MAPE 和 MAE 作为指标。在训练期间,MAPE 正在爆炸,但 MSE 和 MAE 似乎训练正常:
Epoch 1/20
275/275 [==============================] - 191s 693ms/step - loss: 0.1005 - mape: 15794.8682 - mae: 0.2382 - val_loss: 0.0334 - val_mape: 24.9470 - val_mae: 0.1607
Epoch 2/20
275/275 [==============================] - 184s 669ms/step - loss: 0.0099 - mape: 6385.5464 - mae: 0.0725 - val_loss: 0.0078 - val_mape: 11.3268 - val_mae: 0.0803
Epoch 3/20
275/275 [==============================] - 186s 676ms/step - loss: 0.0025 - mape: 5909.3735 - mae: 0.0369 - val_loss: 0.0131 - val_mape: 14.9827 - val_mae: 0.1061
Epoch 4/20
275/275 [==============================] - 187s 678ms/step - loss: 0.0015 - mape: 4746.2788 - mae: 0.0278 - val_loss: 0.0142 - val_mape: 16.1894 - val_mae: 0.1122
Epoch 5/20
30/275 [==>...........................] - ETA: 2:38 - loss: 0.0012 - mape: 9.3647 - mae: 0.0246
MAPE 在每个纪元结束时爆炸。这种特定行为的原因可能是什么?
MAPE 仍在随着每个 epoch 下降,所以这不是真正的问题,因为它不会阻碍训练过程吗?
你的损失和 MAPE 正在减少,听起来不错。但是如果您害怕 MAPE 中的高值,您可以判断是否有接近零的 Y 值。因为MAPE是百分比误差。
MAPE 结果可能具有误导性。来自 Wikipedia:
Although the concept of MAPE sounds very simple and convincing, it has
major drawbacks in practical application, and there are many studies
on shortcomings and misleading results from MAPE.
- It cannot be used if there are zero values (which sometimes happens for example in demand data) because there would be a division
by zero.
- For forecasts which are too low the percentage error cannot exceed 100%, but for forecasts which are too high there is no upper limit to
the percentage error.
- MAPE puts a heavier penalty on negative errors, than on positive errors.
To overcome these issues with MAPE, there are some other measures
proposed in literature:
- Mean Absolute Scaled Error (MASE)
- Symmetric Mean Absolute Percentage Error (sMAPE)
- Mean Directional Accuracy (MDA)
- Mean Arctangent Absolute Percentage Error (MAAPE)
我在 Keras 中实施了一个 LSTM 以重现 this paper。奇怪的行为很简单:我有一个 MSE 损失函数和一个 MAPE 和 MAE 作为指标。在训练期间,MAPE 正在爆炸,但 MSE 和 MAE 似乎训练正常:
Epoch 1/20
275/275 [==============================] - 191s 693ms/step - loss: 0.1005 - mape: 15794.8682 - mae: 0.2382 - val_loss: 0.0334 - val_mape: 24.9470 - val_mae: 0.1607
Epoch 2/20
275/275 [==============================] - 184s 669ms/step - loss: 0.0099 - mape: 6385.5464 - mae: 0.0725 - val_loss: 0.0078 - val_mape: 11.3268 - val_mae: 0.0803
Epoch 3/20
275/275 [==============================] - 186s 676ms/step - loss: 0.0025 - mape: 5909.3735 - mae: 0.0369 - val_loss: 0.0131 - val_mape: 14.9827 - val_mae: 0.1061
Epoch 4/20
275/275 [==============================] - 187s 678ms/step - loss: 0.0015 - mape: 4746.2788 - mae: 0.0278 - val_loss: 0.0142 - val_mape: 16.1894 - val_mae: 0.1122
Epoch 5/20
30/275 [==>...........................] - ETA: 2:38 - loss: 0.0012 - mape: 9.3647 - mae: 0.0246
MAPE 在每个纪元结束时爆炸。这种特定行为的原因可能是什么?
MAPE 仍在随着每个 epoch 下降,所以这不是真正的问题,因为它不会阻碍训练过程吗?
你的损失和 MAPE 正在减少,听起来不错。但是如果您害怕 MAPE 中的高值,您可以判断是否有接近零的 Y 值。因为MAPE是百分比误差。
MAPE 结果可能具有误导性。来自 Wikipedia:
Although the concept of MAPE sounds very simple and convincing, it has major drawbacks in practical application, and there are many studies on shortcomings and misleading results from MAPE.
- It cannot be used if there are zero values (which sometimes happens for example in demand data) because there would be a division by zero.
- For forecasts which are too low the percentage error cannot exceed 100%, but for forecasts which are too high there is no upper limit to the percentage error.
- MAPE puts a heavier penalty on negative errors, than on positive errors.
To overcome these issues with MAPE, there are some other measures proposed in literature:
- Mean Absolute Scaled Error (MASE)
- Symmetric Mean Absolute Percentage Error (sMAPE)
- Mean Directional Accuracy (MDA)
- Mean Arctangent Absolute Percentage Error (MAAPE)