如何计算LSTM网络的参数个数？

Question

有没有办法计算 LSTM 网络中的参数总数。

我找到了一个例子，但我不确定 this 有多正确，或者我是否理解正确。

例如，请考虑以下示例：-

from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation
from keras.layers import Embedding
from keras.layers import LSTM
model = Sequential()
model.add(LSTM(256, input_dim=4096, input_length=16))
model.summary()

输出

____________________________________________________________________________________________________
Layer (type)                       Output Shape        Param #     Connected to                     
====================================================================================================
lstm_1 (LSTM)                      (None, 256)         4457472     lstm_input_1[0][0]               
====================================================================================================
Total params: 4457472
____________________________________________________________________________________________________

根据我的理解 n 是输入向量长度。而m是时间步数。在这个例子中，他们认为隐藏层的数量是 1。

因此根据 the post. 4(nm+n^2) 中的公式在我的示例中 m=16;n=4096;num_of_units=256

4*((4096*16)+(4096*4096))*256 = 17246978048

为什么会有这样的差异？是我误解了示例还是公式有误？

Answer 1

否 - Keras 中 LSTM 层的参数数量等于：

params = 4 * ((size_of_input + 1) * size_of_output + size_of_output^2)

额外的 1 来自偏差项。所以 n 是输入的大小（增加了偏置项），m 是 LSTM 层的输出大小。

最后 :

4 * (4097 * 256 + 256^2) = 4457472

Answer 2

@JohnStrong 的公式扩展：

4 意味着我们对 3 个门（读/写/froget）和 - 4-th - 有不同的权重和偏差变量对于 细胞状态 （在相同的隐藏状态内）。（提到的这些在特定隐藏状态向量的时间步之间共享）

4 * lstm_hidden_state_size * (lstm_inputs_size + bias_variable + lstm_outputs_size)

as LSTM 输出 (y) 通过方法是 h（隐藏状态），因此，对于 LSTM 输出，没有额外的投影我们有：

lstm_hidden_state_size = lstm_outputs_size

假设它是 d :

d = lstm_hidden_state_size = lstm_outputs_size

然后

params = 4 * d * ((lstm_inputs_size + 1) + d) = 4 * ((lstm_inputs_size + 1) * d + d^2)

Answer 3

image via this post

num_params = [(num_units + input_dim + 1) * num_units] * 4

num_units + input_dim: concat [h(t-1), x(t)]

+ 1: 偏差

* 4:有4个神经网络层（黄框）{W_forget,W_input,W_output,W_cell}

model.add(LSTM(units=256, input_dim=4096, input_length=16))

[(256 + 4096 + 1) * 256] * 4 = 4457472

PS: num_units = num_hidden_units = output_dims

Answer 4

LSTM 方程（来自 deeplearning.ai Coursera）

从等式可以看出，所有 6 个等式的最终维度都相同，并且 最终维度必须等于 a(t).

在这 6 个方程中，只有 4 个方程对参数数量有贡献，通过查看方程，可以推断出这 4 个方程都是对称的。所以，如果我们找出 1 个方程式的参数数量，我们只需将其乘以 4 即可得出参数总数。

需要注意的一点是，参数总数不取决于时间步长（或 input_length），因为相同的 "W" 和 "b" 在整个过程中共享时间步长。

假设，LSTM 单元的内部只有一层作为门（就像在 Keras 中那样）。

取等式 1 并进行关联。设层中的神经元数为 n，x 的维数为 m（不包括示例数和时间步长）。因此，忘记门的维度也将是 n。现在，与 ANN 中的相同，"Wf" 的维度将为 n*(n+m)，"bf" 的维度将为 n。因此，一个方程的参数总数将为 [{n*(n+m)} + n]。因此，参数总数将为 4*[{n*(n+m)} + n]。打开括号，我们将得到 -> 4*(nm + n² + n).

所以，根据你的价值观。将其输入公式得到：->(n=256,m=4096)，参数总数为 4*((256*256) + (256*4096) + (256) ) = 4*(1114368) = 4457472.

Answer 5

我觉得如果我们从一个简单的循环神经网络开始，会更容易理解。

假设我们有 4 个单元（请忽略网络中的 ...，只关注可见单元），输入大小（维数）为 3：

循环连接的权重数为 28 = 16 (num_units * num_units) + 输入的 12 (input_dim * num_units)。偏差的数量只是 num_units.

循环意味着每个神经元的输出都反馈到整个网络，所以如果我们按时间顺序展开它，它看起来像两个密集层：

这清楚地说明了为什么我们对循环部分设置 num_units * num_units 权重。

这个简单的RNN的参数个数是32 = 4 * 4 + 3 * 4 + 4，可以表示为num_units * num_units + input_dim * num_units + num_units或num_units * (num_units + input_dim + 1)

现在，对于 LSTM，我们必须将这些参数的数量乘以 4，因为这是每个单元内 sub-parameters 的数量，@FelixHo[= 的回答很好地说明了这一点18=]

Answer 6

其他人已经回答的差不多了。但只是为了进一步说明，关于创建 LSTM 层。参数个数如下：

参数数量= 4*((num_features 使用+1)*num_units+ num_units^2)

+1 是因为我们采取了额外的偏见。

其中 num_features 是 LSTM 输入形状中的 num_features： Input_shape=(window_size,num_features)

如何计算LSTM网络的参数个数？

How to calculate the number of parameters of an LSTM network?

machine-learning

neural-network

deep-learning

lstm

keras

输出