为什么这段关于使用 mxnet 训练回归网络的代码不收敛?

why this code about using mxnet to train a regression net doesn't converge?

这是代码:

import mxnet
from mxnet import io, gluon, autograd
from mxnet.gluon import nn
from mxnet.gluon.data import ArrayDataset
ctx =  mxnet.gpu() if mxnet.test_utils.list_gpus() else mxnet.cpu()

iter = io.CSVIter(data_csv="data/housing.csv", batch_size=100, data_shape=(10, ))


loss = gluon.loss.L2Loss()
net = nn.Sequential()
net.add(nn.Dense(1))
net.initialize(mxnet.init.Normal(sigma=0.01), ctx=ctx)
trainer = gluon.Trainer(net.collect_params(), 'sgd', {'learning_rate': 0.001})

for (i, iter_data) in enumerate(iter):
    data = iter_data.data[0]
    label_data = data[:, 8]
    train_data = data[:, 3]
    with autograd.record():
        l = loss(net(train_data), label_data)
    l.backward()
    trainer.step(100)
    print(l.mean().asnumpy())

数据是美国的房价,数据是这样的:

-122.23,37.88,41.0,880.0,129.0,322.0,126.0,8.3252,452600.0,NEAR BAY -122.22,37.86,21.0,7099.0,1106.0,2401.0,1138.0,8.3014,358500.0,NEAR BAY -122.24,37.85,52.0,1467.0,190.0,496.0,177.0,7.2574,352100.0,NEAR BAY -122.25,37.85,52.0,1274.0,235.0,558.0,219.0,5.6431,341300.0,NEAR BAY -122.25,37.85,52.0,1627.0,280.0,565.0,259.0,3.8462,342200.0,NEAR BAY -122.25,37.85,52.0,919.0,213.0,413.0,193.0,4.0368,269700.0,NEAR BAY -122.25,37.84,52.0,2535.0,489.0,1094.0,514.0,3.6591,299200.0,NEAR BAY -122.25,37.84,52.0,3104.0,687.0,1157.0,647.0,3.12,241400.0,NEAR BAY -122.26,37.84,42.0,2555.0,665.0,1206.0,595.0,2.0804,226700.0,NEAR BAY -122.25,37.84,52.0,3549.0,707.0,1551.0,714.0,3.6912,261100.0,NEAR BAY -122.26,37.85,52.0,2202.0,434.0,910.0,402.0,3.2031,281500.0,NEAR BAY -122.26,37.85,52.0,3503.0,752.0,1504.0,734.0,3.2705,241800.0,NEAR BAY

数据来自https://raw.githubusercontent.com/ageron/handson-ml/master/datasets/housing/housing.tgz

结果让我很困惑:

[1.4657609e+10] [2.184351e+17] [7.357278e+24] [1.0737887e+32] [nan] [nan] ...

那么我的代码有什么问题?

====================更新======================== ========================== 使用 zscore 规范化特征数组,但没有帮助(原谅我使用 numpy 的函数计算 zscore 的懒惰)

import mxnet
import numpy as np
from mxnet import io, gluon, autograd, nd
from mxnet.gluon import nn
from mxnet.gluon.data import ArrayDataset
ctx =  mxnet.gpu() if mxnet.test_utils.list_gpus() else mxnet.cpu()

BATCH_SIZE = 100

iter = io.CSVIter(data_csv="data/housing.csv", batch_size=BATCH_SIZE, data_shape=(10, ))


loss = gluon.loss.L2Loss()
net = nn.Sequential()
net.add(nn.Dense(1))
net.initialize(mxnet.init.Normal(sigma=0.01), ctx=ctx)
trainer = gluon.Trainer(net.collect_params(), 'sgd', {'learning_rate': 0.001})

for (i, iter_data) in enumerate(iter):
    data = iter_data.data[0]
    label_data = data[:, 8]
    train_data = data[:, 3]
    train_data_np = train_data.asnumpy()
    stand = np.std(train_data_np)
    mean = np.mean(train_data_np)
    b = (train_data_np - mean) / stand
    train_data = nd.array(b)
    with autograd.record():
        l = loss(net(train_data), label_data)
    l.backward()
    trainer.step(BATCH_SIZE)
    print(l.mean().asnumpy())

可能有多个问题导致您的代码表现如此:简单模型、缺乏功能、non-normalized 数据...我建议您查看 MXNet 存储库中的房屋预测示例 - https://github.com/apache/incubator-mxnet/tree/master/example/gluon/house_prices

代码在D2L在线书籍以下章节有详细解释:http://d2l.ai/chapter_multilayer-perceptrons/kaggle-house-price.html