预测简单函数的多层网络

Question

我正在尝试培养机器学习的一些直觉。我查看了 https://github.com/deeplearning4j/dl4j-0.4-examples 中的示例，并想开发自己的示例。基本上我只是采用了一个简单的函数：a * a + b * b + c * c - a * b * c + a + b + c 并为随机 a、b、c 生成 10000 个输出并尝试在 90 上训练我的网络输入的百分比。问题是无论我做什么，我的网络都无法预测其余示例。

这是我的代码：

public class BasicFunctionNN {

    private static Logger log = LoggerFactory.getLogger(MlPredict.class);

    public static DataSetIterator generateFunctionDataSet() {
        Collection<DataSet> list = new ArrayList<>();
        for (int i = 0; i < 100000; i++) {
            double a = Math.random();
            double b = Math.random();
            double c = Math.random();

            double output = a * a + b * b + c * c - a * b * c + a + b + c;
            INDArray in = Nd4j.create(new double[]{a, b, c});
            INDArray out = Nd4j.create(new double[]{output});
            list.add(new DataSet(in, out));
        }
        return new ListDataSetIterator(list, list.size());
    }

    public static void main(String[] args) throws Exception {
        DataSetIterator iterator = generateFunctionDataSet();

        Nd4j.MAX_SLICES_TO_PRINT = 10;
        Nd4j.MAX_ELEMENTS_PER_SLICE = 10;

        final int numInputs = 3;
        int outputNum = 1;
        int iterations = 100;

        log.info("Build model....");
        MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder()
                .iterations(iterations).weightInit(WeightInit.XAVIER).updater(Updater.SGD).dropOut(0.5)
                .learningRate(.8).regularization(true)
                .l1(1e-1).l2(2e-4)
                .optimizationAlgo(OptimizationAlgorithm.LINE_GRADIENT_DESCENT)
                .list(3)
                .layer(0, new DenseLayer.Builder().nIn(numInputs).nOut(8)
                        .activation("identity")
                        .build())
                .layer(1, new DenseLayer.Builder().nIn(8).nOut(8)
                        .activation("identity")
                        .build())
                .layer(2, new OutputLayer.Builder(LossFunctions.LossFunction.RMSE_XENT)//LossFunctions.LossFunction.RMSE_XENT)
                        .activation("identity")
                        .weightInit(WeightInit.XAVIER)
                        .nIn(8).nOut(outputNum).build())
                .backprop(true).pretrain(false)
                .build();


        //run the model
        MultiLayerNetwork model = new MultiLayerNetwork(conf);
        model.init();
        model.setListeners(Collections.singletonList((IterationListener) new ScoreIterationListener(iterations)));

        //get the dataset using the record reader. The datasetiterator handles vectorization
        DataSet next = iterator.next();
        SplitTestAndTrain testAndTrain = next.splitTestAndTrain(0.9);
        System.out.println(testAndTrain.getTrain());

        model.fit(testAndTrain.getTrain());

        //evaluate the model
        Evaluation eval = new Evaluation(10);
        DataSet test = testAndTrain.getTest();
        INDArray output = model.output(test.getFeatureMatrix());
        eval.eval(test.getLabels(), output);
        log.info(">>>>>>>>>>>>>>");
        log.info(eval.stats());

    }
}

我也玩过学习率，很多时候分数没有提高：

10:48:51.404 [main] DEBUG o.d.o.solvers.BackTrackLineSearch - Exited line search after maxIterations termination condition; score did not improve (bestScore=0.8522868127536543, scoreAtStart=0.8522868127536543). Resetting parameters

作为激活函数我也尝试了relu

Answer 1

一个明显的问题是您试图用线性模型对非线性函数建模。您的神经网络没有激活函数，因此它只能高效地表达 W1a + W2b + W3c + W4 形式的函数。无论您创建多少隐藏单元都没有关系 - 只要没有使用非线性激活函数，您的网络就会退化为简单的线性模型。

更新

还有很多"small weird things"，包括但不限于：

您正在使用巨大的学习率 (0.8)
您正在使用大量正则化来解决（非常复杂，同时使用 l1 和 l2 正则化器进行回归不是一种常见的方法，尤其是在神经网络中）您需要 none
整流器单元可能不是表达平方运算以及您正在寻找的乘法运算的最佳单元。整流器非常适合分类，尤其是对于较深的架构，但不适用于浅层回归。尝试使用类似 sigmoid（tanh，sigmoid）的激活函数。
我不完全确定 "iteration" 在此实现中的含义，但通常这是用于训练的 samples/minibatches 的数量。因此，仅使用 100 个数量级对于梯度下降学习来说可能太小了

预测简单函数的多层网络

MultiLayerNetwork to predict simple function

machine-learning

deep-learning

deeplearning4j

更新