为什么在 keras 中没有指定 DNN 的预训练？

Question

问题更多是关于DNN的训练算法，而不是软件keras。

据我所知，深度神经网络的工作归功于训练算法的改进。从20世纪80年代开始，BP算法被用于训练神经网络，但当网络很深时会出现过拟合问题。大约 10 年前，Hinton 改进了算法，首先使用未标记数据预训练网络，然后使用 BP 算法。预训练在避免过拟合方面起着重要作用。

然而，当我开始尝试 Keras 时，使用 SGD 算法的 mnist DNN 示例（在下面）没有提及预训练过程导致非常高的预测精度。于是，我开始怀疑预训练哪里去了？是不是我理解错了深度学习训练算法（我觉得经典BP和SGD差不多）？还是一种新的训练技术取代了预训练过程？

非常感谢您的帮助！

'''Trains a simple deep NN on the MNIST dataset.
Gets to 98.40% test accuracy after 20 epochs
(there is *a lot* of margin for parameter tuning).
2 seconds per epoch on a K520 GPU.
'''

from __future__ import print_function
import numpy as np
np.random.seed(1337)  # for reproducibility

from keras.datasets import mnist
from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Activation
from keras.optimizers import SGD, Adam, RMSprop
from keras.utils import np_utils


batch_size = 128
nb_classes = 10
nb_epoch = 20

# the data, shuffled and split between train and test sets
(X_train, y_train), (X_test, y_test) = mnist.load_data()

X_train = X_train.reshape(60000, 784)
X_test = X_test.reshape(10000, 784)
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_test /= 255
print(X_train.shape[0], 'train samples')
print(X_test.shape[0], 'test samples')

# convert class vectors to binary class matrices
Y_train = np_utils.to_categorical(y_train, nb_classes)
Y_test = np_utils.to_categorical(y_test, nb_classes)

model = Sequential()
model.add(Dense(512, input_shape=(784,)))
model.add(Activation('relu'))
model.add(Dropout(0.2))
model.add(Dense(512))
model.add(Activation('relu'))
model.add(Dropout(0.2))
model.add(Dense(10))
model.add(Activation('softmax'))

model.summary()

model.compile(loss='categorical_crossentropy',
              optimizer=RMSprop(),
              metrics=['accuracy'])

history = model.fit(X_train, Y_train,
                    batch_size=batch_size, nb_epoch=nb_epoch,
                    verbose=1, validation_data=(X_test, Y_test))
score = model.evaluate(X_test, Y_test, verbose=0)
print('Test score:', score[0])
print('Test accuracy:', score[1])

Answer 1

你错了。

过去与今天

过去的神经网络和现在的神经网络的区别不在于训练算法。每个 DNN 都经过基于某种基于 SGD 的算法的反向传播训练，与过去完全一样。（有一些新算法试图通过自适应学习率减少参数调整，例如 Adam、RMSprop 和 co.；但普通 SGD 仍然是最常见的算法，例如用于 AlphaGo）

区别仅在于大小 = 层数（深度；由于基于 GPU 的评估，这是可能的）和激活函数的选择。 ReLU 只是比经典的 Sigmoid 或 Tanh 激活效果更好（关于速度和稳定性）。

预训练

我还认为，预训练在 5-10 年前非常流行，但今天没有人这样做（如果你有足够的数据）！让我引用 from here:

It's true that unsupervised pre-training was initially what made it possible to train deeper networks, but the last few years the pre-training approach has been largely obsoleted. Nowadays, deep neural networks are a lot more similar to their 80's cousins. Instead of pre-training, the difference is now in the activation functions and regularisation methods used (and sometimes in the optimisation algorithm, although much more rarely). I would say that the "pre-training era", which started around 2006, ended in the early '10s when people started using rectified linear units (ReLUs), and later dropout, and discovered that pre-training was no longer beneficial for this type of networks.

我可以推荐 these slides 作为现代深度学习的介绍（作为起点）。

Answer 2

预训练实际上在 NLP 社区中再次获得了很大的关注，请参阅 OpenAI 的 GPT：其想法是预训练充当 fine-tuning 具有监督数据的模型之前的无监督初始化步骤。这是因为未标记的数据比标记的对应数据丰富得多，并且可以利用它来导出模型内的合理权重，以表达数据集结构内的隐藏链接。希望解释不要太愚蠢:)

为什么在 keras 中没有指定 DNN 的预训练？

Why pretraining for DNN is not specified in keras?

machine-learning

neural-network

deep-learning

keras

过去与今天

预训练