DL4J写的异或神经网络不起作用

The XOR neural network written in DL4J does not work

我开始结合DL4j框架研究神经网络,从异或训练开始。但是无论我做什么,我都会得到错误的结果。

MultiLayerConfiguration networkConfiguration = new NeuralNetConfiguration.Builder()
            .weightInit(WeightInit.SIGMOID_UNIFORM)
            .list()
            .layer(new DenseLayer.Builder()
                    .nIn(2).nOut(2)
                    .activation(Activation.SIGMOID)
                    .build())
            .layer( new DenseLayer.Builder()
                    .nIn(2).nOut(2)
                    .activation(Activation.SIGMOID)
                    .build())
            .layer( new OutputLayer.Builder()
                    .nIn(2).nOut(1)
                    .activation(Activation.SIGMOID)
                    .lossFunction(LossFunctions.LossFunction.XENT)
                    .build())
            .build();

    MultiLayerNetwork network = new MultiLayerNetwork(networkConfiguration);
    network.setListeners(new ScoreIterationListener(1));
    network.init();


    INDArray input = Nd4j.createFromArray(new double[][]{{0,1},{0,0},{1,0},{1,1}});

    INDArray output = Nd4j.createFromArray(new double[][]{{0^1},{0^0},{1^0},{1^1}});
    //   INDArray output = Nd4j.createFromArray(new double[]{0^1,0^0,1^1,1^0});
    //DataSet dataSet = new org.nd4j.linalg.dataset.DataSet(input,output);

    for (int i = 0; i < 10000; i++) {
        network.fit(input,output);
    }


    INDArray res = network.output(input,false);

    System.out.print(res);

学习成绩:

[[0.5748], 
 [0.5568], 
 [0.4497], 
 [0.4533]]

这看起来像是一个老例子。你从哪里弄来的?请注意,该项目不认可或支持人们从中提取的随机示例。如果这是书中的内容,请注意这些示例此时已有几年历史,不应使用。

这应该是最新的: https://github.com/eclipse/deeplearning4j-examples/blob/master/dl4j-examples/src/main/java/org/deeplearning4j/examples/quickstart/modeling/feedforward/classification/ModelXOR.java

此配置患有我喜欢称之为“玩具问题综合症”的问题。 Dl4j 默认采用小批量,因此默认情况下会根据输入示例的小批量大小裁剪学习。如果你在现实世界中做任何事情,99% 的问题都是这样设置的。

这意味着如果你用记忆中的整套玩具问题,网络采取的每一步实际上并不是它所采取的完整步骤。我们最新的示例通过为此关闭小批量来处理此问题:

      MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder()
            .updater(new Sgd(0.1))
            .seed(seed)
            .biasInit(0) // init the bias with 0 - empirical value, too
            // The networks can process the input more quickly and more accurately by ingesting
            // minibatches 5-10 elements at a time in parallel.
            // This example runs better without, because the dataset is smaller than the mini batch size
            .miniBatch(false)
            .list()
            .layer(new DenseLayer.Builder()
                .nIn(2)
                .nOut(4)
                .activation(Activation.SIGMOID)
                // random initialize weights with values between 0 and 1
                .weightInit(new UniformDistribution(0, 1))
                .build())
            .layer(new OutputLayer.Builder(LossFunctions.LossFunction.NEGATIVELOGLIKELIHOOD)
                .nOut(2)
                .activation(Activation.SOFTMAX)
                .weightInit(new UniformDistribution(0, 1))
                .build())
            .build();

注意配置中的 minibatch(false)。