为什么我的 RNN 不学习?
Why isn't my RNN learning?
我正在尝试使用 numpy(基于 this article)实现一个简单的 RNN,我正在训练它进行二进制加法,一次将两个 8 位无符号整数相加一位(从末尾开始)目的是让它在必要时在添加过程中学习 "carry the one" 。然而,它似乎并没有学习。对于训练,我选择两个随机数,向前传播 8 个步骤,将 a 和 b 的一位作为输入,并在每个时间步存储输出和隐藏层值,然后反向传播 8 个步骤,计算隐藏层误差 (output_error.dot(weights_hidden_to_output.T)) * sigmoid_to_derivative(hidden) + future_hidden_error.dot(weights_hidden_to_hidden.T)
) 和每个权重矩阵的更新,矩阵乘以父层乘以子层的误差。这是正确的方法吗?
这是我的代码,如果它能让它更清楚的话。我注意到,出于某种原因,每次我训练它时,权重都会突然疯狂地增加,并且它们会导致 sigmoid 函数溢出,从而导致训练失败。知道是什么原因造成的吗?
import numpy as np
np.random.seed(0)
def sigmoid(x):
return np.atleast_2d(1/(1+np.exp(-x)))
#return np.atleast_2d(np.max(x, 0.01))
def sig_deriv(x):
return x*(1-x)
def add_bias(x):
return np.hstack([np.ones((len(x), 1)), x])
def dec_to_bin(dec):
return np.array(map(int, list(format(dec, '#010b'))[2:]))
def bin_to_dec(b):
out = 0
for bit in b:
out = (out << 1) | bit
return out
batch_size = 8
learning_rate = .1
input_size = 2
hidden_size = 16
output_size = 1
weights_xh = 2 * np.random.random((input_size+1, hidden_size)) - 1
weights_hh = 2 * np.random.random((hidden_size+1, hidden_size)) - 1
weights_hy = 2 * np.random.random((hidden_size+1, output_size)) - 1
xh_update = np.zeros_like(weights_xh)
hh_update = np.zeros_like(weights_hh)
hy_update = np.zeros_like(weights_hy)
for i in xrange(10000):
a = np.random.randint(0, 2**batch_size/2)
b = np.random.randint(0, 2**batch_size/2)
sum_ = a+b
X = add_bias(np.hstack([np.atleast_2d(dec_to_bin(a)).T, np.atleast_2d(dec_to_bin(b)).T]))
y = np.atleast_2d(dec_to_bin(sum_)).T
error = 0
output_errors = []
outputs = []
hiddens = [add_bias(np.zeros((1, hidden_size)))]
#forward propagation through time
for j in xrange(batch_size):
hidden = sigmoid(X[-j-1].dot(weights_xh) + hiddens[-1].dot(weights_hh))
hidden = add_bias(hidden)
hiddens.append(hidden)
output = sigmoid(hidden.dot(weights_hy))
outputs.append(output[0][0])
output_error = (y[-j-1] - output)
error += np.abs(output_error[0])
output_errors.append((output_error * sig_deriv(output)))
future_hidden_error = np.zeros((1,hidden_size))
#backward ppropagation through time
for j in xrange(batch_size):
output_error = output_errors[-j-1]
hidden = hiddens[-j-1]
prev_hidden = hiddens[-j-2]
hidden_error = (output_error.dot(weights_hy.T) * sig_deriv(hidden)) + future_hidden_error.dot(weights_hh.T)
hidden_error = np.delete(hidden_error, 0, 1) #delete bias error
xh_update += np.atleast_2d(X[j]).T.dot(hidden_error)
hh_update += prev_hidden.T.dot(hidden_error)
hy_update += hidden.T.dot(output_error)
future_hidden_error = hidden_error
weights_xh += (xh_update * learning_rate)/batch_size
weights_hh += (hh_update * learning_rate)/batch_size
weights_hy += (hy_update * learning_rate)/batch_size
xh_update *= 0
hh_update *= 0
hy_update *= 0
if i%1000==0:
guess = map(int, map(round, outputs[::-1]))
print "Iteration {}".format(i)
print "Error: {}".format(error)
print "Problem: {} + {} = {}".format(a, b, sum_)
print "a: {}".format(list(dec_to_bin(a)))
print "+ b: {}".format(list(dec_to_bin(b)))
print "Solution: {}".format(map(int, y))
print "Guess: {} ({})".format(guess, bin_to_dec(guess))
print
我想通了。如果有人想知道为什么它不起作用,那是因为我只将隐藏错误的一部分(来自输出错误的部分)乘以隐藏层激活的导数。现在只需几千次迭代即可轻松学习加法问题。
我正在尝试使用 numpy(基于 this article)实现一个简单的 RNN,我正在训练它进行二进制加法,一次将两个 8 位无符号整数相加一位(从末尾开始)目的是让它在必要时在添加过程中学习 "carry the one" 。然而,它似乎并没有学习。对于训练,我选择两个随机数,向前传播 8 个步骤,将 a 和 b 的一位作为输入,并在每个时间步存储输出和隐藏层值,然后反向传播 8 个步骤,计算隐藏层误差 (output_error.dot(weights_hidden_to_output.T)) * sigmoid_to_derivative(hidden) + future_hidden_error.dot(weights_hidden_to_hidden.T)
) 和每个权重矩阵的更新,矩阵乘以父层乘以子层的误差。这是正确的方法吗?
这是我的代码,如果它能让它更清楚的话。我注意到,出于某种原因,每次我训练它时,权重都会突然疯狂地增加,并且它们会导致 sigmoid 函数溢出,从而导致训练失败。知道是什么原因造成的吗?
import numpy as np
np.random.seed(0)
def sigmoid(x):
return np.atleast_2d(1/(1+np.exp(-x)))
#return np.atleast_2d(np.max(x, 0.01))
def sig_deriv(x):
return x*(1-x)
def add_bias(x):
return np.hstack([np.ones((len(x), 1)), x])
def dec_to_bin(dec):
return np.array(map(int, list(format(dec, '#010b'))[2:]))
def bin_to_dec(b):
out = 0
for bit in b:
out = (out << 1) | bit
return out
batch_size = 8
learning_rate = .1
input_size = 2
hidden_size = 16
output_size = 1
weights_xh = 2 * np.random.random((input_size+1, hidden_size)) - 1
weights_hh = 2 * np.random.random((hidden_size+1, hidden_size)) - 1
weights_hy = 2 * np.random.random((hidden_size+1, output_size)) - 1
xh_update = np.zeros_like(weights_xh)
hh_update = np.zeros_like(weights_hh)
hy_update = np.zeros_like(weights_hy)
for i in xrange(10000):
a = np.random.randint(0, 2**batch_size/2)
b = np.random.randint(0, 2**batch_size/2)
sum_ = a+b
X = add_bias(np.hstack([np.atleast_2d(dec_to_bin(a)).T, np.atleast_2d(dec_to_bin(b)).T]))
y = np.atleast_2d(dec_to_bin(sum_)).T
error = 0
output_errors = []
outputs = []
hiddens = [add_bias(np.zeros((1, hidden_size)))]
#forward propagation through time
for j in xrange(batch_size):
hidden = sigmoid(X[-j-1].dot(weights_xh) + hiddens[-1].dot(weights_hh))
hidden = add_bias(hidden)
hiddens.append(hidden)
output = sigmoid(hidden.dot(weights_hy))
outputs.append(output[0][0])
output_error = (y[-j-1] - output)
error += np.abs(output_error[0])
output_errors.append((output_error * sig_deriv(output)))
future_hidden_error = np.zeros((1,hidden_size))
#backward ppropagation through time
for j in xrange(batch_size):
output_error = output_errors[-j-1]
hidden = hiddens[-j-1]
prev_hidden = hiddens[-j-2]
hidden_error = (output_error.dot(weights_hy.T) * sig_deriv(hidden)) + future_hidden_error.dot(weights_hh.T)
hidden_error = np.delete(hidden_error, 0, 1) #delete bias error
xh_update += np.atleast_2d(X[j]).T.dot(hidden_error)
hh_update += prev_hidden.T.dot(hidden_error)
hy_update += hidden.T.dot(output_error)
future_hidden_error = hidden_error
weights_xh += (xh_update * learning_rate)/batch_size
weights_hh += (hh_update * learning_rate)/batch_size
weights_hy += (hy_update * learning_rate)/batch_size
xh_update *= 0
hh_update *= 0
hy_update *= 0
if i%1000==0:
guess = map(int, map(round, outputs[::-1]))
print "Iteration {}".format(i)
print "Error: {}".format(error)
print "Problem: {} + {} = {}".format(a, b, sum_)
print "a: {}".format(list(dec_to_bin(a)))
print "+ b: {}".format(list(dec_to_bin(b)))
print "Solution: {}".format(map(int, y))
print "Guess: {} ({})".format(guess, bin_to_dec(guess))
print
我想通了。如果有人想知道为什么它不起作用,那是因为我只将隐藏错误的一部分(来自输出错误的部分)乘以隐藏层激活的导数。现在只需几千次迭代即可轻松学习加法问题。