损失:在tensorflow keras回归网络中拟合训练数据时的nan
loss: nan when fitting training data in tensorflow keras regression network
我正在尝试从一本书中复制一个回归网络,但无论我尝试什么,我在拟合过程中只会得到 nan 损失。我已经检查过,这可能是因为:
- 错误的输入数据:我的数据是干净的
- 未缩放的输入数据:我试过 StandardScaler 和 MinMaxScaler 但没有骰子
- 未缩放的输出数据:我也尝试过使用训练集在 0 和 1 之间缩放,但新实例会落在外面。
- 梯度爆炸:可能是这种情况,但即使使用正则化它仍然会发生
- 学习率太陡:即使将其设置为较低的数字也能解决问题
- 无限制的步骤:甚至剪裁都无法修复它
- 错误测量:从 mse 更改为 mean absolute 也不起作用
- 批量太大:将训练数据减少到前 200 个条目也不起作用
还有什么可能是损失函数中出现nans的原因?
编辑:互联网上的所有示例模型也会发生这种情况
我真是没脑子了
数据如下所示:
X_train[:5]
Out[4]:
array([[-3.89243447e-01, -6.10268198e-01, 7.23982383e+00,
7.68512713e+00, -9.15360303e-01, -4.34319791e-02,
1.69375104e+00, -2.66593858e-01],
[-1.00512751e+00, -6.10268198e-01, 5.90241386e-02,
6.22319189e-01, -7.82304360e-01, -6.23993472e-02,
-8.17899555e-01, 1.52950349e+00],
[ 5.45617265e-01, 5.78632450e-01, -1.56942033e-01,
-2.49063893e-01, -5.28447626e-01, -3.67342889e-02,
-8.31983577e-01, 7.11281365e-01],
[-1.53276576e-01, 1.84679314e+00, -9.75702024e-02,
3.03921163e-01, -5.96726334e-01, -6.73883756e-02,
-7.14616727e-01, 6.56400612e-01],
[ 1.97163670e+00, -1.56138872e+00, 9.87949430e-01,
-3.36887553e-01, -3.42869600e-01, 5.08919289e-03,
-6.86448683e-01, 3.12148621e-01]])
X_valid[:5]
Out[5]:
array([[ 2.06309546e-01, 1.21271280e+00, -7.86614121e-01,
1.36422365e-01, -6.81637034e-01, -1.12999850e-01,
-8.78930317e-01, 7.21259683e-01],
[ 7.12374210e-01, 1.82332234e-01, 2.24876920e-01,
-2.22866905e-02, 1.51713346e-01, -2.62325989e-02,
8.01762978e-01, -1.20954497e+00],
[ 5.86851369e+00, 2.61592277e-01, 1.86656568e+00,
-9.86220816e-02, 7.11794858e-02, -1.50302387e-02,
9.05045806e-01, -1.38915470e+00],
[-1.81402984e-01, -5.54478959e-02, -6.23050382e-02,
3.15382948e-02, -2.41326907e-01, -4.58773896e-02,
-8.74235643e-01, 7.86118754e-01],
[ 5.02584914e-01, -6.10268198e-01, 8.08807908e-01,
1.22787966e-01, -3.13107087e-01, 4.73927994e-03,
1.14447418e+00, -8.00433903e-01]])
y_train[:5]
Out[6]:
array([[-0.4648844 ],
[-1.26625476],
[-0.11064919],
[ 0.55441007],
[ 1.19863195]])
y_valid[:5]
Out[7]:
array([[ 2.018235 ],
[ 1.25593471],
[ 2.54525539],
[ 0.04215816],
[-0.39716296]])
代码:
keras.__version__ 2.4.0
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, MinMaxScaler
from tensorflow import keras
import numpy as np
housing = fetch_california_housing()
X_train_full, X_test, y_train_full, y_test = train_test_split(housing.data, housing.target)
X_train, X_valid, y_train, y_valid = train_test_split(X_train_full, y_train_full)
scaler = StandardScaler()
scaler.fit(X_train)
X_train = scaler.transform(X_train)
X_valid = scaler.transform(X_valid)
X_test = scaler.transform(X_test)
print(f'X_train:{X_train.shape}, X_valid: { X_valid.shape}, y_train: {y_train.shape}, y_valid:{y_valid.shape}')
print(f'X_test: {X_test.shape}, y_test: {y_test.shape}')
assert not np.nan in X_train
assert not np.nan in X_valid
scalery=StandardScaler()
y_train=scalery.fit_transform(y_train.reshape(len(y_train),1))
y_valid=scalery.transform(y_valid.reshape(len(y_valid),1))
y_test=scalery.transform(y_test.reshape(len(y_test),1))
#initializers: relu:he_uniform, tanh:glorot
model = keras.models.Sequential([
keras.layers.Dense(30, activation="relu",input_shape=X_train.shape[1:]
, kernel_initializer="he_uniform"
, kernel_regularizer='l1')
,keras.layers.Dense(1)
])
optimizer = keras.optimizers.SGD(lr=0.0001, clipvalue=1)
model.compile(loss=keras.losses.MeanSquaredError()
, optimizer=optimizer)
history = model.fit(X_train[0:200], y_train[0:200]
, epochs=5
,validation_data=(X_valid[0:20], y_valid[0:20]))
输出:
X_train:(11610, 8), X_valid: (3870, 8), y_train: (11610,), y_valid:(3870,)
X_test: (5160, 8), y_test: (5160,)
Epoch 1/5
7/7 [==============================] - 0s 24ms/step - loss: nan - val_loss: nan
Epoch 2/5
7/7 [==============================] - 0s 4ms/step - loss: nan - val_loss: nan
Epoch 3/5
7/7 [==============================] - 0s 4ms/step - loss: nan - val_loss: nan
Epoch 4/5
7/7 [==============================] - 0s 5ms/step - loss: nan - val_loss: nan
Epoch 5/5
7/7 [==============================] - 0s 4ms/step - loss: nan - val_loss: nan
有趣的读物(没有帮助):
- https://stats.stackexchange.com/questions/362461/is-it-better-to-avoid-relu-as-activation-function-if-input-data-has-plenty-of-ne
- https://discuss.tensorflow.org/t/getting-nan-for-loss/4826
- https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/
- https://keras.io/api/optimizers/sgd/
- https://keras.io/api/layers/regularizers/
- https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html
我找到了我自己的问题的答案:
事实证明,Tensorflow 目前无法在 python 3.10 中运行。
将我的 python 版本降级到 3.8 后,一切都开始工作了。
我正在尝试从一本书中复制一个回归网络,但无论我尝试什么,我在拟合过程中只会得到 nan 损失。我已经检查过,这可能是因为:
- 错误的输入数据:我的数据是干净的
- 未缩放的输入数据:我试过 StandardScaler 和 MinMaxScaler 但没有骰子
- 未缩放的输出数据:我也尝试过使用训练集在 0 和 1 之间缩放,但新实例会落在外面。
- 梯度爆炸:可能是这种情况,但即使使用正则化它仍然会发生
- 学习率太陡:即使将其设置为较低的数字也能解决问题
- 无限制的步骤:甚至剪裁都无法修复它
- 错误测量:从 mse 更改为 mean absolute 也不起作用
- 批量太大:将训练数据减少到前 200 个条目也不起作用
还有什么可能是损失函数中出现nans的原因?
编辑:互联网上的所有示例模型也会发生这种情况
我真是没脑子了
数据如下所示:
X_train[:5]
Out[4]:
array([[-3.89243447e-01, -6.10268198e-01, 7.23982383e+00,
7.68512713e+00, -9.15360303e-01, -4.34319791e-02,
1.69375104e+00, -2.66593858e-01],
[-1.00512751e+00, -6.10268198e-01, 5.90241386e-02,
6.22319189e-01, -7.82304360e-01, -6.23993472e-02,
-8.17899555e-01, 1.52950349e+00],
[ 5.45617265e-01, 5.78632450e-01, -1.56942033e-01,
-2.49063893e-01, -5.28447626e-01, -3.67342889e-02,
-8.31983577e-01, 7.11281365e-01],
[-1.53276576e-01, 1.84679314e+00, -9.75702024e-02,
3.03921163e-01, -5.96726334e-01, -6.73883756e-02,
-7.14616727e-01, 6.56400612e-01],
[ 1.97163670e+00, -1.56138872e+00, 9.87949430e-01,
-3.36887553e-01, -3.42869600e-01, 5.08919289e-03,
-6.86448683e-01, 3.12148621e-01]])
X_valid[:5]
Out[5]:
array([[ 2.06309546e-01, 1.21271280e+00, -7.86614121e-01,
1.36422365e-01, -6.81637034e-01, -1.12999850e-01,
-8.78930317e-01, 7.21259683e-01],
[ 7.12374210e-01, 1.82332234e-01, 2.24876920e-01,
-2.22866905e-02, 1.51713346e-01, -2.62325989e-02,
8.01762978e-01, -1.20954497e+00],
[ 5.86851369e+00, 2.61592277e-01, 1.86656568e+00,
-9.86220816e-02, 7.11794858e-02, -1.50302387e-02,
9.05045806e-01, -1.38915470e+00],
[-1.81402984e-01, -5.54478959e-02, -6.23050382e-02,
3.15382948e-02, -2.41326907e-01, -4.58773896e-02,
-8.74235643e-01, 7.86118754e-01],
[ 5.02584914e-01, -6.10268198e-01, 8.08807908e-01,
1.22787966e-01, -3.13107087e-01, 4.73927994e-03,
1.14447418e+00, -8.00433903e-01]])
y_train[:5]
Out[6]:
array([[-0.4648844 ],
[-1.26625476],
[-0.11064919],
[ 0.55441007],
[ 1.19863195]])
y_valid[:5]
Out[7]:
array([[ 2.018235 ],
[ 1.25593471],
[ 2.54525539],
[ 0.04215816],
[-0.39716296]])
代码:
keras.__version__ 2.4.0
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, MinMaxScaler
from tensorflow import keras
import numpy as np
housing = fetch_california_housing()
X_train_full, X_test, y_train_full, y_test = train_test_split(housing.data, housing.target)
X_train, X_valid, y_train, y_valid = train_test_split(X_train_full, y_train_full)
scaler = StandardScaler()
scaler.fit(X_train)
X_train = scaler.transform(X_train)
X_valid = scaler.transform(X_valid)
X_test = scaler.transform(X_test)
print(f'X_train:{X_train.shape}, X_valid: { X_valid.shape}, y_train: {y_train.shape}, y_valid:{y_valid.shape}')
print(f'X_test: {X_test.shape}, y_test: {y_test.shape}')
assert not np.nan in X_train
assert not np.nan in X_valid
scalery=StandardScaler()
y_train=scalery.fit_transform(y_train.reshape(len(y_train),1))
y_valid=scalery.transform(y_valid.reshape(len(y_valid),1))
y_test=scalery.transform(y_test.reshape(len(y_test),1))
#initializers: relu:he_uniform, tanh:glorot
model = keras.models.Sequential([
keras.layers.Dense(30, activation="relu",input_shape=X_train.shape[1:]
, kernel_initializer="he_uniform"
, kernel_regularizer='l1')
,keras.layers.Dense(1)
])
optimizer = keras.optimizers.SGD(lr=0.0001, clipvalue=1)
model.compile(loss=keras.losses.MeanSquaredError()
, optimizer=optimizer)
history = model.fit(X_train[0:200], y_train[0:200]
, epochs=5
,validation_data=(X_valid[0:20], y_valid[0:20]))
输出:
X_train:(11610, 8), X_valid: (3870, 8), y_train: (11610,), y_valid:(3870,)
X_test: (5160, 8), y_test: (5160,)
Epoch 1/5
7/7 [==============================] - 0s 24ms/step - loss: nan - val_loss: nan
Epoch 2/5
7/7 [==============================] - 0s 4ms/step - loss: nan - val_loss: nan
Epoch 3/5
7/7 [==============================] - 0s 4ms/step - loss: nan - val_loss: nan
Epoch 4/5
7/7 [==============================] - 0s 5ms/step - loss: nan - val_loss: nan
Epoch 5/5
7/7 [==============================] - 0s 4ms/step - loss: nan - val_loss: nan
有趣的读物(没有帮助):
- https://stats.stackexchange.com/questions/362461/is-it-better-to-avoid-relu-as-activation-function-if-input-data-has-plenty-of-ne
- https://discuss.tensorflow.org/t/getting-nan-for-loss/4826
- https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/
- https://keras.io/api/optimizers/sgd/
- https://keras.io/api/layers/regularizers/
- https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html
我找到了我自己的问题的答案:
事实证明,Tensorflow 目前无法在 python 3.10 中运行。 将我的 python 版本降级到 3.8 后,一切都开始工作了。