为什么这段关于使用 mxnet 训练回归网络的代码不收敛?
why this code about using mxnet to train a regression net doesn't converge?
这是代码:
import mxnet
from mxnet import io, gluon, autograd
from mxnet.gluon import nn
from mxnet.gluon.data import ArrayDataset
ctx = mxnet.gpu() if mxnet.test_utils.list_gpus() else mxnet.cpu()
iter = io.CSVIter(data_csv="data/housing.csv", batch_size=100, data_shape=(10, ))
loss = gluon.loss.L2Loss()
net = nn.Sequential()
net.add(nn.Dense(1))
net.initialize(mxnet.init.Normal(sigma=0.01), ctx=ctx)
trainer = gluon.Trainer(net.collect_params(), 'sgd', {'learning_rate': 0.001})
for (i, iter_data) in enumerate(iter):
data = iter_data.data[0]
label_data = data[:, 8]
train_data = data[:, 3]
with autograd.record():
l = loss(net(train_data), label_data)
l.backward()
trainer.step(100)
print(l.mean().asnumpy())
数据是美国的房价,数据是这样的:
-122.23,37.88,41.0,880.0,129.0,322.0,126.0,8.3252,452600.0,NEAR BAY
-122.22,37.86,21.0,7099.0,1106.0,2401.0,1138.0,8.3014,358500.0,NEAR BAY
-122.24,37.85,52.0,1467.0,190.0,496.0,177.0,7.2574,352100.0,NEAR BAY
-122.25,37.85,52.0,1274.0,235.0,558.0,219.0,5.6431,341300.0,NEAR BAY
-122.25,37.85,52.0,1627.0,280.0,565.0,259.0,3.8462,342200.0,NEAR BAY
-122.25,37.85,52.0,919.0,213.0,413.0,193.0,4.0368,269700.0,NEAR BAY
-122.25,37.84,52.0,2535.0,489.0,1094.0,514.0,3.6591,299200.0,NEAR BAY
-122.25,37.84,52.0,3104.0,687.0,1157.0,647.0,3.12,241400.0,NEAR BAY
-122.26,37.84,42.0,2555.0,665.0,1206.0,595.0,2.0804,226700.0,NEAR BAY
-122.25,37.84,52.0,3549.0,707.0,1551.0,714.0,3.6912,261100.0,NEAR BAY
-122.26,37.85,52.0,2202.0,434.0,910.0,402.0,3.2031,281500.0,NEAR BAY
-122.26,37.85,52.0,3503.0,752.0,1504.0,734.0,3.2705,241800.0,NEAR BAY
数据来自https://raw.githubusercontent.com/ageron/handson-ml/master/datasets/housing/housing.tgz
结果让我很困惑:
[1.4657609e+10]
[2.184351e+17]
[7.357278e+24]
[1.0737887e+32]
[nan]
[nan]
...
那么我的代码有什么问题?
====================更新======================== ==========================
使用 zscore 规范化特征数组,但没有帮助(原谅我使用 numpy 的函数计算 zscore 的懒惰)
import mxnet
import numpy as np
from mxnet import io, gluon, autograd, nd
from mxnet.gluon import nn
from mxnet.gluon.data import ArrayDataset
ctx = mxnet.gpu() if mxnet.test_utils.list_gpus() else mxnet.cpu()
BATCH_SIZE = 100
iter = io.CSVIter(data_csv="data/housing.csv", batch_size=BATCH_SIZE, data_shape=(10, ))
loss = gluon.loss.L2Loss()
net = nn.Sequential()
net.add(nn.Dense(1))
net.initialize(mxnet.init.Normal(sigma=0.01), ctx=ctx)
trainer = gluon.Trainer(net.collect_params(), 'sgd', {'learning_rate': 0.001})
for (i, iter_data) in enumerate(iter):
data = iter_data.data[0]
label_data = data[:, 8]
train_data = data[:, 3]
train_data_np = train_data.asnumpy()
stand = np.std(train_data_np)
mean = np.mean(train_data_np)
b = (train_data_np - mean) / stand
train_data = nd.array(b)
with autograd.record():
l = loss(net(train_data), label_data)
l.backward()
trainer.step(BATCH_SIZE)
print(l.mean().asnumpy())
可能有多个问题导致您的代码表现如此:简单模型、缺乏功能、non-normalized 数据...我建议您查看 MXNet 存储库中的房屋预测示例 - https://github.com/apache/incubator-mxnet/tree/master/example/gluon/house_prices
代码在D2L在线书籍以下章节有详细解释:http://d2l.ai/chapter_multilayer-perceptrons/kaggle-house-price.html
这是代码:
import mxnet
from mxnet import io, gluon, autograd
from mxnet.gluon import nn
from mxnet.gluon.data import ArrayDataset
ctx = mxnet.gpu() if mxnet.test_utils.list_gpus() else mxnet.cpu()
iter = io.CSVIter(data_csv="data/housing.csv", batch_size=100, data_shape=(10, ))
loss = gluon.loss.L2Loss()
net = nn.Sequential()
net.add(nn.Dense(1))
net.initialize(mxnet.init.Normal(sigma=0.01), ctx=ctx)
trainer = gluon.Trainer(net.collect_params(), 'sgd', {'learning_rate': 0.001})
for (i, iter_data) in enumerate(iter):
data = iter_data.data[0]
label_data = data[:, 8]
train_data = data[:, 3]
with autograd.record():
l = loss(net(train_data), label_data)
l.backward()
trainer.step(100)
print(l.mean().asnumpy())
数据是美国的房价,数据是这样的:
-122.23,37.88,41.0,880.0,129.0,322.0,126.0,8.3252,452600.0,NEAR BAY -122.22,37.86,21.0,7099.0,1106.0,2401.0,1138.0,8.3014,358500.0,NEAR BAY -122.24,37.85,52.0,1467.0,190.0,496.0,177.0,7.2574,352100.0,NEAR BAY -122.25,37.85,52.0,1274.0,235.0,558.0,219.0,5.6431,341300.0,NEAR BAY -122.25,37.85,52.0,1627.0,280.0,565.0,259.0,3.8462,342200.0,NEAR BAY -122.25,37.85,52.0,919.0,213.0,413.0,193.0,4.0368,269700.0,NEAR BAY -122.25,37.84,52.0,2535.0,489.0,1094.0,514.0,3.6591,299200.0,NEAR BAY -122.25,37.84,52.0,3104.0,687.0,1157.0,647.0,3.12,241400.0,NEAR BAY -122.26,37.84,42.0,2555.0,665.0,1206.0,595.0,2.0804,226700.0,NEAR BAY -122.25,37.84,52.0,3549.0,707.0,1551.0,714.0,3.6912,261100.0,NEAR BAY -122.26,37.85,52.0,2202.0,434.0,910.0,402.0,3.2031,281500.0,NEAR BAY -122.26,37.85,52.0,3503.0,752.0,1504.0,734.0,3.2705,241800.0,NEAR BAY
数据来自https://raw.githubusercontent.com/ageron/handson-ml/master/datasets/housing/housing.tgz
结果让我很困惑:
[1.4657609e+10] [2.184351e+17] [7.357278e+24] [1.0737887e+32] [nan] [nan] ...
那么我的代码有什么问题?
====================更新======================== ========================== 使用 zscore 规范化特征数组,但没有帮助(原谅我使用 numpy 的函数计算 zscore 的懒惰)
import mxnet
import numpy as np
from mxnet import io, gluon, autograd, nd
from mxnet.gluon import nn
from mxnet.gluon.data import ArrayDataset
ctx = mxnet.gpu() if mxnet.test_utils.list_gpus() else mxnet.cpu()
BATCH_SIZE = 100
iter = io.CSVIter(data_csv="data/housing.csv", batch_size=BATCH_SIZE, data_shape=(10, ))
loss = gluon.loss.L2Loss()
net = nn.Sequential()
net.add(nn.Dense(1))
net.initialize(mxnet.init.Normal(sigma=0.01), ctx=ctx)
trainer = gluon.Trainer(net.collect_params(), 'sgd', {'learning_rate': 0.001})
for (i, iter_data) in enumerate(iter):
data = iter_data.data[0]
label_data = data[:, 8]
train_data = data[:, 3]
train_data_np = train_data.asnumpy()
stand = np.std(train_data_np)
mean = np.mean(train_data_np)
b = (train_data_np - mean) / stand
train_data = nd.array(b)
with autograd.record():
l = loss(net(train_data), label_data)
l.backward()
trainer.step(BATCH_SIZE)
print(l.mean().asnumpy())
可能有多个问题导致您的代码表现如此:简单模型、缺乏功能、non-normalized 数据...我建议您查看 MXNet 存储库中的房屋预测示例 - https://github.com/apache/incubator-mxnet/tree/master/example/gluon/house_prices
代码在D2L在线书籍以下章节有详细解释:http://d2l.ai/chapter_multilayer-perceptrons/kaggle-house-price.html