keras.layers.Bidirectional之后的参数个数不是加倍了吗?
Num of parameters after keras.layers.Bidirectional is not doubled?
下面是代码和结果。有 2 种型号:一种是双向的。我的问题是为什么 # of parameters (264) time_distributed_14 (TimeDis 不是 time_distributed_13 (TimeDis (136 )?我知道264 = 136 * 2 - 8。为什么我们这里需要-8?
from keras.models import Sequential
from keras.layers import Dense, Activation, TimeDistributed, Bidirectional
from keras.layers.recurrent import GRU
import numpy as np
InputSize = 15
MaxLen = 64
HiddenSize = 16
OutputSize = 8
n_samples = 1000
model1 = Sequential()
model1.add(GRU(HiddenSize, return_sequences=True, input_shape=(MaxLen, InputSize)))
model1.add(TimeDistributed(Dense(OutputSize)))
model1.add(Activation('softmax'))
model1.compile(loss='categorical_crossentropy', optimizer='rmsprop')
model2 = Sequential()
model2.add(Bidirectional(GRU(HiddenSize, return_sequences=True), input_shape=(MaxLen, InputSize)))
model2.add(TimeDistributed(Dense(OutputSize)))
model2.add(Activation('softmax'))
model2.compile(loss='categorical_crossentropy', optimizer='rmsprop')
print(model1.summary())
print(model2.summary())
结果:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
gru_9 (GRU) (None, 64, 16) 1536
_________________________________________________________________
time_distributed_13 (TimeDis (None, 64, 8) 136
_________________________________________________________________
activation_6 (Activation) (None, 64, 8) 0
=================================================================
Total params: 1,672
Trainable params: 1,672
Non-trainable params: 0
_________________________________________________________________
None
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
bidirectional_7 (Bidirection (None, 64, 32) 3072
_________________________________________________________________
time_distributed_14 (TimeDis (None, 64, 8) 264
_________________________________________________________________
activation_7 (Activation) (None, 64, 8) 0
=================================================================
Total params: 3,336
Trainable params: 3,336
Non-trainable params: 0
_________________________________________________________________
None
不仅"weights"还有"biases",偏见完全忽略了输入。
weights = input * output
- regular: = 16*8 = 128
- bidirec: = 32*8 = 256
biases = output
- regular: = 8
- bidirec: = 8
parameters = weights + biases
- regular: = 128 + 8 = 136
- bidirec: = 256 + 8 = 264
下面是代码和结果。有 2 种型号:一种是双向的。我的问题是为什么 # of parameters (264) time_distributed_14 (TimeDis 不是 time_distributed_13 (TimeDis (136 )?我知道264 = 136 * 2 - 8。为什么我们这里需要-8?
from keras.models import Sequential
from keras.layers import Dense, Activation, TimeDistributed, Bidirectional
from keras.layers.recurrent import GRU
import numpy as np
InputSize = 15
MaxLen = 64
HiddenSize = 16
OutputSize = 8
n_samples = 1000
model1 = Sequential()
model1.add(GRU(HiddenSize, return_sequences=True, input_shape=(MaxLen, InputSize)))
model1.add(TimeDistributed(Dense(OutputSize)))
model1.add(Activation('softmax'))
model1.compile(loss='categorical_crossentropy', optimizer='rmsprop')
model2 = Sequential()
model2.add(Bidirectional(GRU(HiddenSize, return_sequences=True), input_shape=(MaxLen, InputSize)))
model2.add(TimeDistributed(Dense(OutputSize)))
model2.add(Activation('softmax'))
model2.compile(loss='categorical_crossentropy', optimizer='rmsprop')
print(model1.summary())
print(model2.summary())
结果:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
gru_9 (GRU) (None, 64, 16) 1536
_________________________________________________________________
time_distributed_13 (TimeDis (None, 64, 8) 136
_________________________________________________________________
activation_6 (Activation) (None, 64, 8) 0
=================================================================
Total params: 1,672
Trainable params: 1,672
Non-trainable params: 0
_________________________________________________________________
None
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
bidirectional_7 (Bidirection (None, 64, 32) 3072
_________________________________________________________________
time_distributed_14 (TimeDis (None, 64, 8) 264
_________________________________________________________________
activation_7 (Activation) (None, 64, 8) 0
=================================================================
Total params: 3,336
Trainable params: 3,336
Non-trainable params: 0
_________________________________________________________________
None
不仅"weights"还有"biases",偏见完全忽略了输入。
weights = input * output
- regular: = 16*8 = 128
- bidirec: = 32*8 = 256
biases = output
- regular: = 8
- bidirec: = 8
parameters = weights + biases
- regular: = 128 + 8 = 136
- bidirec: = 256 + 8 = 264