为什么在 keras 中没有指定 DNN 的预训练?

Why pretraining for DNN is not specified in keras?

问题更多是关于DNN的训练算法,而不是软件keras。

据我所知,深度神经网络的工作归功于训练算法的改进。从20世纪80年代开始,BP算法被用于训练神经网络,但当网络很深时会出现过拟合问题。大约 10 年前,Hinton 改进了算法,首先使用未标记数据预训练网络,然后使用 BP 算法。预训练在避免过拟合方面起着重要作用。

然而,当我开始尝试 Keras 时,使用 SGD 算法的 mnist DNN 示例(在下面)没有提及预训练过程导致非常高的预测精度。于是,我开始怀疑预训练哪里去了?是不是我理解错了深度学习训练算法(我觉得经典BP和SGD差不多)?还是一种新的训练技术取代了预训练过程?

非常感谢您的帮助!

'''Trains a simple deep NN on the MNIST dataset.
Gets to 98.40% test accuracy after 20 epochs
(there is *a lot* of margin for parameter tuning).
2 seconds per epoch on a K520 GPU.
'''

from __future__ import print_function
import numpy as np
np.random.seed(1337)  # for reproducibility

from keras.datasets import mnist
from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Activation
from keras.optimizers import SGD, Adam, RMSprop
from keras.utils import np_utils


batch_size = 128
nb_classes = 10
nb_epoch = 20

# the data, shuffled and split between train and test sets
(X_train, y_train), (X_test, y_test) = mnist.load_data()

X_train = X_train.reshape(60000, 784)
X_test = X_test.reshape(10000, 784)
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_test /= 255
print(X_train.shape[0], 'train samples')
print(X_test.shape[0], 'test samples')

# convert class vectors to binary class matrices
Y_train = np_utils.to_categorical(y_train, nb_classes)
Y_test = np_utils.to_categorical(y_test, nb_classes)

model = Sequential()
model.add(Dense(512, input_shape=(784,)))
model.add(Activation('relu'))
model.add(Dropout(0.2))
model.add(Dense(512))
model.add(Activation('relu'))
model.add(Dropout(0.2))
model.add(Dense(10))
model.add(Activation('softmax'))

model.summary()

model.compile(loss='categorical_crossentropy',
              optimizer=RMSprop(),
              metrics=['accuracy'])

history = model.fit(X_train, Y_train,
                    batch_size=batch_size, nb_epoch=nb_epoch,
                    verbose=1, validation_data=(X_test, Y_test))
score = model.evaluate(X_test, Y_test, verbose=0)
print('Test score:', score[0])
print('Test accuracy:', score[1])

你错了。

过去与今天

过去的神经网络和现在的神经网络的区别不在于训练算法。每个 DNN 都经过基于某种基于 SGD 的算法的反向传播训练,与过去完全一样。 (有一些新算法试图通过自适应学习率减少参数调整,例如 Adam、RMSprop 和 co.;但普通 SGD 仍然是最常见的算法,例如用于 AlphaGo)

区别仅在于大小 = 层数(深度;由于基于 GPU 的评估,这是可能的)和激活函数的选择。 ReLU 只是比经典的 Sigmoid 或 Tanh 激活效果更好(关于速度和稳定性)。

预训练

我还认为,预训练在 5-10 年前非常流行,但今天没有人这样做(如果你有足够的数据)! 让我引用 from here:

It's true that unsupervised pre-training was initially what made it possible to train deeper networks, but the last few years the pre-training approach has been largely obsoleted. Nowadays, deep neural networks are a lot more similar to their 80's cousins. Instead of pre-training, the difference is now in the activation functions and regularisation methods used (and sometimes in the optimisation algorithm, although much more rarely). I would say that the "pre-training era", which started around 2006, ended in the early '10s when people started using rectified linear units (ReLUs), and later dropout, and discovered that pre-training was no longer beneficial for this type of networks.

我可以推荐 these slides 作为现代深度学习的介绍(作为起点)。

预训练实际上在 NLP 社区中再次获得了很大的关注,请参阅 OpenAI 的 GPT:其想法是预训练充当 fine-tuning 具有监督数据的模型之前的无监督初始化步骤。这是因为未标记的数据比标记的对应数据丰富得多,并且可以利用它来导出模型内的合理权重,以表达数据集结构内的隐藏链接。 希望解释不要太愚蠢:)