Keras 自定义图层给出异常结果
Keras custom Layer Gives Abnormal results
我试图了解 Keras 自定义层的工作原理。我正在尝试创建一个乘法层,它采用标量输入并将其与被乘数相乘。我生成一些随机数据并想学习被乘数。当我尝试使用 10 个数字时,它工作正常。但是,当我尝试使用 20 个数字时,损失就会激增。
from keras import backend as K
from keras.engine.topology import Layer
from keras import initializers
class MultiplicationLayer(Layer):
def __init__(self, **kwargs):
super(MultiplicationLayer, self).__init__(**kwargs)
def build(self, input_shape):
# Create a trainable weight variable for this layer.
self.kernel = self.add_weight(name='multiplicand',
shape=(1,),
initializer='glorot_uniform',
trainable=True)
self.built = True
def call(self, x):
return self.kernel*x
def compute_output_shape(self, input_shape):
return input_shape
使用 TensorFlow 后端。
用 10 个数字测试模型 1
from keras.layers import Input
from keras.models import Model
# input is a single scalar
input = Input(shape=(1,))
multiply = MultiplicationLayer()(input)
model = Model(input, multiply)
model.compile(optimizer='sgd', loss='mse')
import numpy as np
input_data = np.arange(10)
output_data = 2 * input_data
model.fit(input_data, output_data, epochs=10)
#print(model.layers[1].multiplicand.get_value())
print(model.layers[1].get_weights())
纪元 1/10
10/10 [==============================] - 7s - 损失:257.6145
纪元 2/10
10/10 [==============================] - 0s - 损失:47.6329
时代 3/10
10/10 [==============================] - 0s - 损失:8.8073
时代 4/10
10/10 [==============================] - 0s - 损失:1.6285
纪元 5/10
10/10 [==============================] - 0s - 损失:0.3011
时代 6/10
10/10 [==============================] - 0s - 损失:0.0557
时代 7/10
10/10 [==============================] - 0s - 损失:0.0103
时代 8/10
10/10 [==============================] - 0s - 损失:0.0019
时代 9/10
10/10 [==============================] - 0s - 损失:3.5193e-04
时代 10/10
10/10 [==============================] - 0s - 损失:6.5076e-05
[数组([ 1.99935019], dtype=float32)]
用 20 个数字测试模型 2
from keras.layers import Input
from keras.models import Model
# input is a single scalar
input = Input(shape=(1,))
multiply = MultiplicationLayer()(input)
model = Model(input, multiply)
model.compile(optimizer='sgd', loss='mse')
import numpy as np
input_data = np.arange(20)
output_data = 2 * input_data
model.fit(input_data, output_data, epochs=10)
#print(model.layers[1].multiplicand.get_value())
print(model.layers[1].get_weights())
纪元 1/10
20/20 [==============================] - 0s - 损失:278.2014
纪元 2/10
20/20 [==============================] - 0s - 损失:601.1653
时代 3/10
20/20 [==============================] - 0s - 损失:1299.0583
时代 4/10
20/20 [==============================] - 0s - 损失:2807.1353
纪元 5/10
20/20 [==============================] - 0s - 损失:6065.9375
时代 6/10
20/20 [==============================] - 0s - 损失:13107.8828
时代 7/10
20/20 [==============================] - 0s - 损失:28324.8320
时代 8/10
20/20 [==============================] - 0s - 损失:61207.1250
时代 9/10
20/20 [==============================] - 0s - 损失:132262.4375
时代 10/10
20/20 [==============================] - 0s - 损失:285805.9688
[数组([-68.71629333], dtype=float32)]
知道为什么会发生这种情况吗?
您可以使用其他优化器解决此问题,例如 Adam(lr=0.1)
。不幸的是,这需要 100 个 epochs....,或者通过在 SGD 中使用较小的学习率,例如 SGD(lr = 0.001)
。
from keras.optimizers import *
# input is a single scalar
inp = Input(shape=(1,))
multiply = MultiplicationLayer()(inp)
model = Model(inp, multiply)
model.compile(optimizer=Adam(lr=0.1), loss='mse')
import numpy as np
input_data = np.arange(20)
output_data = 2 * input_data
model.fit(input_data, output_data, epochs=100)
#print(model.layers[1].multiplicand.get_value())
print(model.layers[1].get_weights())
进一步测试,我注意到 SGD(lr = 0.001)
也有效,而 SGD(lr = 0.01)
失效。
我猜是:
如果你的学习率足以使你的更新通过该点的距离比之前更大,则下一步将获得更大的梯度,使你再次通过该点更大的距离.
只有一个数字的例子:
inputNumber = 20
x = currentMultiplicand = 1
targetValue = 40
lr = 0.01
#first step (x=1):
mse = (40-20x)² = 400
gradient = -2*(40-20x)*20 = -800
update = - lr * gradient = 8
new x = 9
#second step (x=9):
mse = (40-20x)² = 19600 #(!!!!!)
gradient = -2*(40-20x)*20 = 5600
update = - lr * gradient = -56
new x = -47
#you can see from here that this is not going to be contained anymore...
同一个例子,学习率较低:
inputNumber = 20
x = currentMultiplicand = 1
targetValue = 40
lr = 0.001
#first step (x=1):
mse = (40-20x)² = 400
gradient = -2*(40-20x)*20 = -800
update = - lr * gradient = 0.8
new x = 1.8
#second step (x=1.8):
mse = (40-20x)² = 16 #(now this is better)
gradient = -2*(40-20x)*20 = -160
update = - lr * gradient = 0.16 #(decreasing update sizes....)
new x = 1.96
#you can see from here that this converging...
我试图了解 Keras 自定义层的工作原理。我正在尝试创建一个乘法层,它采用标量输入并将其与被乘数相乘。我生成一些随机数据并想学习被乘数。当我尝试使用 10 个数字时,它工作正常。但是,当我尝试使用 20 个数字时,损失就会激增。
from keras import backend as K
from keras.engine.topology import Layer
from keras import initializers
class MultiplicationLayer(Layer):
def __init__(self, **kwargs):
super(MultiplicationLayer, self).__init__(**kwargs)
def build(self, input_shape):
# Create a trainable weight variable for this layer.
self.kernel = self.add_weight(name='multiplicand',
shape=(1,),
initializer='glorot_uniform',
trainable=True)
self.built = True
def call(self, x):
return self.kernel*x
def compute_output_shape(self, input_shape):
return input_shape
使用 TensorFlow 后端。
用 10 个数字测试模型 1
from keras.layers import Input
from keras.models import Model
# input is a single scalar
input = Input(shape=(1,))
multiply = MultiplicationLayer()(input)
model = Model(input, multiply)
model.compile(optimizer='sgd', loss='mse')
import numpy as np
input_data = np.arange(10)
output_data = 2 * input_data
model.fit(input_data, output_data, epochs=10)
#print(model.layers[1].multiplicand.get_value())
print(model.layers[1].get_weights())
纪元 1/10 10/10 [==============================] - 7s - 损失:257.6145 纪元 2/10 10/10 [==============================] - 0s - 损失:47.6329 时代 3/10 10/10 [==============================] - 0s - 损失:8.8073 时代 4/10 10/10 [==============================] - 0s - 损失:1.6285 纪元 5/10 10/10 [==============================] - 0s - 损失:0.3011 时代 6/10 10/10 [==============================] - 0s - 损失:0.0557 时代 7/10 10/10 [==============================] - 0s - 损失:0.0103 时代 8/10 10/10 [==============================] - 0s - 损失:0.0019 时代 9/10 10/10 [==============================] - 0s - 损失:3.5193e-04 时代 10/10 10/10 [==============================] - 0s - 损失:6.5076e-05
[数组([ 1.99935019], dtype=float32)]
用 20 个数字测试模型 2
from keras.layers import Input
from keras.models import Model
# input is a single scalar
input = Input(shape=(1,))
multiply = MultiplicationLayer()(input)
model = Model(input, multiply)
model.compile(optimizer='sgd', loss='mse')
import numpy as np
input_data = np.arange(20)
output_data = 2 * input_data
model.fit(input_data, output_data, epochs=10)
#print(model.layers[1].multiplicand.get_value())
print(model.layers[1].get_weights())
纪元 1/10 20/20 [==============================] - 0s - 损失:278.2014 纪元 2/10 20/20 [==============================] - 0s - 损失:601.1653 时代 3/10 20/20 [==============================] - 0s - 损失:1299.0583 时代 4/10 20/20 [==============================] - 0s - 损失:2807.1353 纪元 5/10 20/20 [==============================] - 0s - 损失:6065.9375 时代 6/10 20/20 [==============================] - 0s - 损失:13107.8828 时代 7/10 20/20 [==============================] - 0s - 损失:28324.8320 时代 8/10 20/20 [==============================] - 0s - 损失:61207.1250 时代 9/10 20/20 [==============================] - 0s - 损失:132262.4375 时代 10/10 20/20 [==============================] - 0s - 损失:285805.9688
[数组([-68.71629333], dtype=float32)]
知道为什么会发生这种情况吗?
您可以使用其他优化器解决此问题,例如 Adam(lr=0.1)
。不幸的是,这需要 100 个 epochs....,或者通过在 SGD 中使用较小的学习率,例如 SGD(lr = 0.001)
。
from keras.optimizers import *
# input is a single scalar
inp = Input(shape=(1,))
multiply = MultiplicationLayer()(inp)
model = Model(inp, multiply)
model.compile(optimizer=Adam(lr=0.1), loss='mse')
import numpy as np
input_data = np.arange(20)
output_data = 2 * input_data
model.fit(input_data, output_data, epochs=100)
#print(model.layers[1].multiplicand.get_value())
print(model.layers[1].get_weights())
进一步测试,我注意到 SGD(lr = 0.001)
也有效,而 SGD(lr = 0.01)
失效。
我猜是:
如果你的学习率足以使你的更新通过该点的距离比之前更大,则下一步将获得更大的梯度,使你再次通过该点更大的距离.
只有一个数字的例子:
inputNumber = 20
x = currentMultiplicand = 1
targetValue = 40
lr = 0.01
#first step (x=1):
mse = (40-20x)² = 400
gradient = -2*(40-20x)*20 = -800
update = - lr * gradient = 8
new x = 9
#second step (x=9):
mse = (40-20x)² = 19600 #(!!!!!)
gradient = -2*(40-20x)*20 = 5600
update = - lr * gradient = -56
new x = -47
#you can see from here that this is not going to be contained anymore...
同一个例子,学习率较低:
inputNumber = 20
x = currentMultiplicand = 1
targetValue = 40
lr = 0.001
#first step (x=1):
mse = (40-20x)² = 400
gradient = -2*(40-20x)*20 = -800
update = - lr * gradient = 0.8
new x = 1.8
#second step (x=1.8):
mse = (40-20x)² = 16 #(now this is better)
gradient = -2*(40-20x)*20 = -160
update = - lr * gradient = 0.16 #(decreasing update sizes....)
new x = 1.96
#you can see from here that this converging...