keras.layers.Bidirectional之后的参数个数不是加倍了吗?

Num of parameters after keras.layers.Bidirectional is not doubled?

下面是代码和结果。有 2 种型号:一种是双向的。我的问题是为什么 # of parameters (264) time_distributed_14 (TimeDis 不是 time_distributed_13 (TimeDis (136 )?我知道264 = 136 * 2 - 8。为什么我们这里需要-8

from keras.models import Sequential
from keras.layers import Dense, Activation, TimeDistributed, Bidirectional
from keras.layers.recurrent import GRU
import numpy as np

InputSize = 15
MaxLen = 64
HiddenSize = 16

OutputSize = 8
n_samples = 1000

model1 = Sequential()
model1.add(GRU(HiddenSize, return_sequences=True, input_shape=(MaxLen, InputSize)))
model1.add(TimeDistributed(Dense(OutputSize)))
model1.add(Activation('softmax'))
model1.compile(loss='categorical_crossentropy', optimizer='rmsprop')


model2 = Sequential()
model2.add(Bidirectional(GRU(HiddenSize, return_sequences=True), input_shape=(MaxLen, InputSize)))
model2.add(TimeDistributed(Dense(OutputSize)))
model2.add(Activation('softmax'))
model2.compile(loss='categorical_crossentropy', optimizer='rmsprop')


print(model1.summary())
print(model2.summary())

结果:

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
gru_9 (GRU)                  (None, 64, 16)            1536      
_________________________________________________________________
time_distributed_13 (TimeDis (None, 64, 8)             136       
_________________________________________________________________
activation_6 (Activation)    (None, 64, 8)             0         
=================================================================
Total params: 1,672
Trainable params: 1,672
Non-trainable params: 0
_________________________________________________________________
None
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
bidirectional_7 (Bidirection (None, 64, 32)            3072      
_________________________________________________________________
time_distributed_14 (TimeDis (None, 64, 8)             264       
_________________________________________________________________
activation_7 (Activation)    (None, 64, 8)             0         
=================================================================
Total params: 3,336
Trainable params: 3,336
Non-trainable params: 0
_________________________________________________________________
None

不仅"weights"还有"biases",偏见完全忽略了输入。

weights = input * output 
       - regular: = 16*8 = 128
       - bidirec: = 32*8 = 256

biases = output
       - regular: = 8
       - bidirec: = 8

parameters = weights + biases
       - regular: = 128 + 8 = 136
       - bidirec: = 256 + 8 = 264