为什么 LSTM 使用 sigmoid 函数来模拟门机制而不是二进制值（0/1）？

Why LSTM uses sigmoid function to mimic the gate mechanism instead of binary value(0/1)?

在LSTM中，我们通常使用sigmoid函数来模拟门机制（soft），但问题是在很多情况下，这样的函数给出的值在0.5左右，这对门来说没有任何意义。为什么在 LSTM 中不使用二进制值 (0/1)，在 LSTM 和 GRU 中使用 sigmoid 函数的基本思想和直觉是什么？

S形门输出一个介于0和1之间的值。它描述了应该传递多少信息。值为 0 表示 "Nothing should get through"，值为 1 表示 "Let everything get through"。想了解更多信息，我建议你看看colah's blog。

网络中的二元函数会导致反向传播问题，因为它不是 'nicely differentiable' 函数（delta 函数是它的导数，在数值计算中效果不佳）