二元和分类交叉熵的不同结果

Question

我在binary_crossentropy和categorical_crossentropy的用法之间做了一个实验。我试图了解这两个损失函数在同一问题上的行为。

我用这个 data 解决了 binary classification 问题。

在第一个实验中，我在最后一层使用了1个神经元，具有sigmoid个激活函数和binary_crossentropy个。我训练了这个模型 10 次并取平均准确率。平均精度为 74.12760416666666.

我用于第一个实验的代码如下。

total_acc = 0
for each_iter in range(0, 10):
    print each_iter
    X = dataset[:,0:8]
    y = dataset[:,8]
    # define the keras model
    model = Sequential()
    model.add(Dense(12, input_dim=8, activation='relu'))
    model.add(Dense(8, activation='relu'))
    model.add(Dense(1, activation='sigmoid'))
    # compile the keras model
    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
    # fit the keras model on the dataset
    model.fit(X, y, epochs=150, batch_size=32)
    # evaluate the keras model
    _, accuracy = model.evaluate(X, y)
    print('Accuracy: %.2f' % (accuracy*100))
    temp_acc = accuracy*100
    total_acc += temp_acc

    del model

在第二个实验中，我在最后一层使用了 2 个具有 softmax 激活函数和 categorical_crossentropy 的神经元。我将我的目标 `y, 转换为分类，我再次训练这个模型 10 次并取平均准确度。平均准确率为 66.92708333333334.

我用于第二个设置的代码如下：

total_acc_v2 = 0
for each_iter in range(0, 10):
    print each_iter
    X = dataset[:,0:8]
    y = dataset[:,8]
    y = np_utils.to_categorical(y)
    # define the keras model
    model = Sequential()
    model.add(Dense(12, input_dim=8, activation='relu'))
    model.add(Dense(8, activation='relu'))
    model.add(Dense(2, activation='softmax'))
    # compile the keras model
    model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
    # fit the keras model on the dataset
    model.fit(X, y, epochs=150, batch_size=32)
    # evaluate the keras model
    _, accuracy = model.evaluate(X, y)
    print('Accuracy: %.2f' % (accuracy*100))
    temp_acc = accuracy*100
    total_acc_v2 += temp_acc
    del model

我认为这两个实验是相同的，应该会给出非常相似的结果。究竟是什么原因导致准确率差异如此之大？

Answer 1

似乎这种行为的原因是运行拘束。我有运行你的代码，sigmoid 模型 的平均准确度约为 74，softmax 模型 的平均准确度约为 74。

二元和分类交叉熵的不同结果

Different results from binary and categorical crossentropy

machine-learning

neural-network

deep-learning

keras

loss-function