Softmax MLP 分类器 - 在隐藏层中使用哪个激活函数?

Softmax MLP Classifier - which activation function to use in hidden layer?

我正在从头开始编写一个多层感知器,只有一个输入层、隐藏层和输出层。输出层将使用softmax激活函数产生几个互斥输出的概率。

在我的隐藏层中,也使用 softmax 激活函数对我来说没有意义 - 这是否正确?如果可以,我可以只使用任何其他非线性激活函数,例如 sigmoid 或 tanh 吗?或者我什至可以不在隐藏层中使用任何激活函数,而只是将隐藏节点的值保留为输入节点和输入到隐藏权重的线性组合?

您可以使用任何激活函数。只需测试一些,然后选择产生最佳结果的那个。不过别忘了试试 Relu。据我所知,这是最简单的,实际上效果很好。

In my hidden layer it does not make sense to me to use the softmax activation function too - is this correct?

确实是对的

If so can I just use any other non-linear activation function such as sigmoid or tanh?

可以,但大多数现代方法都需要 Rectified Linear Unit (ReLU) 或其一些变体(Leaky ReLU、ELU 等)。

Or could I even not use any activation function in the hidden layer and just keep the values of the hidden nodes as the linear combinations of the input nodes and input-to-hidden weights?

没有。非线性激活确实是阻止(可能很大的)神经网络表现得像单个线性单元的原因;可以证明(参见 Andrew Ng 的相关讲座@Coursera Why do you need non-linear activation functions?):

It turns out that if you use a linear activation function, or alternatively if you don't have an activation function, then no matter how many layers your neural network has, what is always doing is just computing a linear activation function, so you might as well not have any hidden layers.

The take-home is that a linear hidden layer is more or less useless because the composition of two linear functions is itself a linear function; so unless you throw a non-linearity in there then you're not computing more interesting functions even as you go deeper in the network.

实际上,唯一可以使用线性激活函数的地方是回归问题的输出层(在上面的讲座中也有解释)。