如何使用 LSTM 单元训练 RNN 以进行时间序列预测
How to train a RNN with LSTM cells for time series prediction
我目前正在尝试构建一个用于预测时间序列的简单模型。目标是使用序列训练模型,以便模型能够预测未来值。
我正在使用 tensorflow 和 lstm 单元来执行此操作。该模型通过时间的截断反向传播进行训练。我的问题是如何构建训练数据。
例如,假设我们想要学习给定的序列:
[1,2,3,4,5,6,7,8,9,10,11,...]
然后我们展开 num_steps=4
的网络。
选项 1
input data label
1,2,3,4 2,3,4,5
5,6,7,8 6,7,8,9
9,10,11,12 10,11,12,13
...
选项 2
input data label
1,2,3,4 2,3,4,5
2,3,4,5 3,4,5,6
3,4,5,6 4,5,6,7
...
选项 3
input data label
1,2,3,4 5
2,3,4,5 6
3,4,5,6 7
...
选项 4
input data label
1,2,3,4 5
5,6,7,8 9
9,10,11,12 13
...
如有任何帮助,我们将不胜感激。
我认为选项 1 最接近 /tensorflow/models/rnn/ptb/reader.py
中的参考实现
def ptb_iterator(raw_data, batch_size, num_steps):
"""Iterate on the raw PTB data.
This generates batch_size pointers into the raw PTB data, and allows
minibatch iteration along these pointers.
Args:
raw_data: one of the raw data outputs from ptb_raw_data.
batch_size: int, the batch size.
num_steps: int, the number of unrolls.
Yields:
Pairs of the batched data, each a matrix of shape [batch_size, num_steps].
The second element of the tuple is the same data time-shifted to the
right by one.
Raises:
ValueError: if batch_size or num_steps are too high.
"""
raw_data = np.array(raw_data, dtype=np.int32)
data_len = len(raw_data)
batch_len = data_len // batch_size
data = np.zeros([batch_size, batch_len], dtype=np.int32)
for i in range(batch_size):
data[i] = raw_data[batch_len * i:batch_len * (i + 1)]
epoch_size = (batch_len - 1) // num_steps
if epoch_size == 0:
raise ValueError("epoch_size == 0, decrease batch_size or num_steps")
for i in range(epoch_size):
x = data[:, i*num_steps:(i+1)*num_steps]
y = data[:, i*num_steps+1:(i+1)*num_steps+1]
yield (x, y)
但是,另一个选项是 select 为每个训练序列随机指向数据数组的指针。
我正准备学习 TensorFlow 中的 LSTM 并尝试实现一个示例,该示例(幸运的是)试图预测由简单数学函数生成的一些时间序列/数字序列。
但我正在使用不同的方式来构造训练数据,受到 Unsupervised Learning of Video Representations using LSTMs:
的启发
选项 5:
input data label
1,2,3,4 5,6,7,8
2,3,4,5 6,7,8,9
3,4,5,6 7,8,9,10
...
除了本文之外,我(尝试)从给定的 TensorFlow RNN 示例中获取灵感。我当前的完整解决方案如下所示:
import math
import random
import numpy as np
import tensorflow as tf
LSTM_SIZE = 64
LSTM_LAYERS = 2
BATCH_SIZE = 16
NUM_T_STEPS = 4
MAX_STEPS = 1000
LAMBDA_REG = 5e-4
def ground_truth_func(i, j, t):
return i * math.pow(t, 2) + j
def get_batch(batch_size):
seq = np.zeros([batch_size, NUM_T_STEPS, 1], dtype=np.float32)
tgt = np.zeros([batch_size, NUM_T_STEPS], dtype=np.float32)
for b in xrange(batch_size):
i = float(random.randint(-25, 25))
j = float(random.randint(-100, 100))
for t in xrange(NUM_T_STEPS):
value = ground_truth_func(i, j, t)
seq[b, t, 0] = value
for t in xrange(NUM_T_STEPS):
tgt[b, t] = ground_truth_func(i, j, t + NUM_T_STEPS)
return seq, tgt
# Placeholder for the inputs in a given iteration
sequence = tf.placeholder(tf.float32, [BATCH_SIZE, NUM_T_STEPS, 1])
target = tf.placeholder(tf.float32, [BATCH_SIZE, NUM_T_STEPS])
fc1_weight = tf.get_variable('w1', [LSTM_SIZE, 1], initializer=tf.random_normal_initializer(mean=0.0, stddev=1.0))
fc1_bias = tf.get_variable('b1', [1], initializer=tf.constant_initializer(0.1))
# ENCODER
with tf.variable_scope('ENC_LSTM'):
lstm = tf.nn.rnn_cell.LSTMCell(LSTM_SIZE)
multi_lstm = tf.nn.rnn_cell.MultiRNNCell([lstm] * LSTM_LAYERS)
initial_state = multi_lstm.zero_state(BATCH_SIZE, tf.float32)
state = initial_state
for t_step in xrange(NUM_T_STEPS):
if t_step > 0:
tf.get_variable_scope().reuse_variables()
# state value is updated after processing each batch of sequences
output, state = multi_lstm(sequence[:, t_step, :], state)
learned_representation = state
# DECODER
with tf.variable_scope('DEC_LSTM'):
lstm = tf.nn.rnn_cell.LSTMCell(LSTM_SIZE)
multi_lstm = tf.nn.rnn_cell.MultiRNNCell([lstm] * LSTM_LAYERS)
state = learned_representation
logits_stacked = None
loss = 0.0
for t_step in xrange(NUM_T_STEPS):
if t_step > 0:
tf.get_variable_scope().reuse_variables()
# state value is updated after processing each batch of sequences
output, state = multi_lstm(sequence[:, t_step, :], state)
# output can be used to make next number prediction
logits = tf.matmul(output, fc1_weight) + fc1_bias
if logits_stacked is None:
logits_stacked = logits
else:
logits_stacked = tf.concat(1, [logits_stacked, logits])
loss += tf.reduce_sum(tf.square(logits - target[:, t_step])) / BATCH_SIZE
reg_loss = loss + LAMBDA_REG * (tf.nn.l2_loss(fc1_weight) + tf.nn.l2_loss(fc1_bias))
train = tf.train.AdamOptimizer().minimize(reg_loss)
with tf.Session() as sess:
sess.run(tf.initialize_all_variables())
total_loss = 0.0
for step in xrange(MAX_STEPS):
seq_batch, target_batch = get_batch(BATCH_SIZE)
feed = {sequence: seq_batch, target: target_batch}
_, current_loss = sess.run([train, reg_loss], feed)
if step % 10 == 0:
print("@{}: {}".format(step, current_loss))
total_loss += current_loss
print('Total loss:', total_loss)
print('### SIMPLE EVAL: ###')
seq_batch, target_batch = get_batch(BATCH_SIZE)
feed = {sequence: seq_batch, target: target_batch}
prediction = sess.run([logits_stacked], feed)
for b in xrange(BATCH_SIZE):
print("{} -> {})".format(str(seq_batch[b, :, 0]), target_batch[b, :]))
print(" `-> Prediction: {}".format(prediction[0][b]))
示例输出如下所示:
### SIMPLE EVAL: ###
# [input seq] -> [target prediction]
# `-> Prediction: [model prediction]
[ 33. 53. 113. 213.] -> [ 353. 533. 753. 1013.])
`-> Prediction: [ 19.74548721 28.3149128 33.11489105 35.06603241]
[ -17. -32. -77. -152.] -> [-257. -392. -557. -752.])
`-> Prediction: [-16.38951683 -24.3657589 -29.49801064 -31.58583832]
[ -7. -4. 5. 20.] -> [ 41. 68. 101. 140.])
`-> Prediction: [ 14.14126873 22.74848557 31.29668617 36.73633194]
...
该模型是一个 LSTM 自动编码器,每个有 2 层。
不幸的是,正如您在结果中看到的那样,该模型没有正确学习序列。我可能只是在某个地方犯了一个严重的错误,或者 1000-10000 的训练步骤对于 LSTM 来说只是少数。正如我所说,我也刚刚开始正确地使用 understand/use LSTM。
但希望这能给你一些实施方面的灵感。
在阅读了几个 LSTM 介绍博客后,例如Jakob Aungiers',选项 3 似乎是无状态 LSTM 的正确选项。
如果您的 LSTM 需要比 num_steps
更早地记住数据,您可以以有状态的方式进行训练 - 有关 Keras 示例,请参见 Philippe Remy's blog post "Stateful LSTM in Keras"。然而,Philippe 没有展示批量大小大于 1 的示例。我想在你的情况下,有状态 LSTM 的批量大小为 4 可以与以下数据一起使用(写为 input -> label
):
batch #0:
1,2,3,4 -> 5
2,3,4,5 -> 6
3,4,5,6 -> 7
4,5,6,7 -> 8
batch #1:
5,6,7,8 -> 9
6,7,8,9 -> 10
7,8,9,10 -> 11
8,9,10,11 -> 12
batch #2:
9,10,11,12 -> 13
...
据此,例如批次 #0 中的第二个样本被正确地重复使用,以继续使用批次 #1 的第二个样本进行训练。
这在某种程度上类似于您的选项 4,但是您并没有使用那里的所有可用标签。
更新:
扩展我的建议,其中 batch_size
等于 num_steps
,Alexis Huet for the case of batch_size
being a divisor of num_steps
, which can be used for larger num_steps
. He describes it nicely 在他的博客上。
我目前正在尝试构建一个用于预测时间序列的简单模型。目标是使用序列训练模型,以便模型能够预测未来值。
我正在使用 tensorflow 和 lstm 单元来执行此操作。该模型通过时间的截断反向传播进行训练。我的问题是如何构建训练数据。
例如,假设我们想要学习给定的序列:
[1,2,3,4,5,6,7,8,9,10,11,...]
然后我们展开 num_steps=4
的网络。
选项 1
input data label
1,2,3,4 2,3,4,5
5,6,7,8 6,7,8,9
9,10,11,12 10,11,12,13
...
选项 2
input data label
1,2,3,4 2,3,4,5
2,3,4,5 3,4,5,6
3,4,5,6 4,5,6,7
...
选项 3
input data label
1,2,3,4 5
2,3,4,5 6
3,4,5,6 7
...
选项 4
input data label
1,2,3,4 5
5,6,7,8 9
9,10,11,12 13
...
如有任何帮助,我们将不胜感激。
我认为选项 1 最接近 /tensorflow/models/rnn/ptb/reader.py
中的参考实现def ptb_iterator(raw_data, batch_size, num_steps):
"""Iterate on the raw PTB data.
This generates batch_size pointers into the raw PTB data, and allows
minibatch iteration along these pointers.
Args:
raw_data: one of the raw data outputs from ptb_raw_data.
batch_size: int, the batch size.
num_steps: int, the number of unrolls.
Yields:
Pairs of the batched data, each a matrix of shape [batch_size, num_steps].
The second element of the tuple is the same data time-shifted to the
right by one.
Raises:
ValueError: if batch_size or num_steps are too high.
"""
raw_data = np.array(raw_data, dtype=np.int32)
data_len = len(raw_data)
batch_len = data_len // batch_size
data = np.zeros([batch_size, batch_len], dtype=np.int32)
for i in range(batch_size):
data[i] = raw_data[batch_len * i:batch_len * (i + 1)]
epoch_size = (batch_len - 1) // num_steps
if epoch_size == 0:
raise ValueError("epoch_size == 0, decrease batch_size or num_steps")
for i in range(epoch_size):
x = data[:, i*num_steps:(i+1)*num_steps]
y = data[:, i*num_steps+1:(i+1)*num_steps+1]
yield (x, y)
但是,另一个选项是 select 为每个训练序列随机指向数据数组的指针。
我正准备学习 TensorFlow 中的 LSTM 并尝试实现一个示例,该示例(幸运的是)试图预测由简单数学函数生成的一些时间序列/数字序列。
但我正在使用不同的方式来构造训练数据,受到 Unsupervised Learning of Video Representations using LSTMs:
的启发选项 5:
input data label
1,2,3,4 5,6,7,8
2,3,4,5 6,7,8,9
3,4,5,6 7,8,9,10
...
除了本文之外,我(尝试)从给定的 TensorFlow RNN 示例中获取灵感。我当前的完整解决方案如下所示:
import math
import random
import numpy as np
import tensorflow as tf
LSTM_SIZE = 64
LSTM_LAYERS = 2
BATCH_SIZE = 16
NUM_T_STEPS = 4
MAX_STEPS = 1000
LAMBDA_REG = 5e-4
def ground_truth_func(i, j, t):
return i * math.pow(t, 2) + j
def get_batch(batch_size):
seq = np.zeros([batch_size, NUM_T_STEPS, 1], dtype=np.float32)
tgt = np.zeros([batch_size, NUM_T_STEPS], dtype=np.float32)
for b in xrange(batch_size):
i = float(random.randint(-25, 25))
j = float(random.randint(-100, 100))
for t in xrange(NUM_T_STEPS):
value = ground_truth_func(i, j, t)
seq[b, t, 0] = value
for t in xrange(NUM_T_STEPS):
tgt[b, t] = ground_truth_func(i, j, t + NUM_T_STEPS)
return seq, tgt
# Placeholder for the inputs in a given iteration
sequence = tf.placeholder(tf.float32, [BATCH_SIZE, NUM_T_STEPS, 1])
target = tf.placeholder(tf.float32, [BATCH_SIZE, NUM_T_STEPS])
fc1_weight = tf.get_variable('w1', [LSTM_SIZE, 1], initializer=tf.random_normal_initializer(mean=0.0, stddev=1.0))
fc1_bias = tf.get_variable('b1', [1], initializer=tf.constant_initializer(0.1))
# ENCODER
with tf.variable_scope('ENC_LSTM'):
lstm = tf.nn.rnn_cell.LSTMCell(LSTM_SIZE)
multi_lstm = tf.nn.rnn_cell.MultiRNNCell([lstm] * LSTM_LAYERS)
initial_state = multi_lstm.zero_state(BATCH_SIZE, tf.float32)
state = initial_state
for t_step in xrange(NUM_T_STEPS):
if t_step > 0:
tf.get_variable_scope().reuse_variables()
# state value is updated after processing each batch of sequences
output, state = multi_lstm(sequence[:, t_step, :], state)
learned_representation = state
# DECODER
with tf.variable_scope('DEC_LSTM'):
lstm = tf.nn.rnn_cell.LSTMCell(LSTM_SIZE)
multi_lstm = tf.nn.rnn_cell.MultiRNNCell([lstm] * LSTM_LAYERS)
state = learned_representation
logits_stacked = None
loss = 0.0
for t_step in xrange(NUM_T_STEPS):
if t_step > 0:
tf.get_variable_scope().reuse_variables()
# state value is updated after processing each batch of sequences
output, state = multi_lstm(sequence[:, t_step, :], state)
# output can be used to make next number prediction
logits = tf.matmul(output, fc1_weight) + fc1_bias
if logits_stacked is None:
logits_stacked = logits
else:
logits_stacked = tf.concat(1, [logits_stacked, logits])
loss += tf.reduce_sum(tf.square(logits - target[:, t_step])) / BATCH_SIZE
reg_loss = loss + LAMBDA_REG * (tf.nn.l2_loss(fc1_weight) + tf.nn.l2_loss(fc1_bias))
train = tf.train.AdamOptimizer().minimize(reg_loss)
with tf.Session() as sess:
sess.run(tf.initialize_all_variables())
total_loss = 0.0
for step in xrange(MAX_STEPS):
seq_batch, target_batch = get_batch(BATCH_SIZE)
feed = {sequence: seq_batch, target: target_batch}
_, current_loss = sess.run([train, reg_loss], feed)
if step % 10 == 0:
print("@{}: {}".format(step, current_loss))
total_loss += current_loss
print('Total loss:', total_loss)
print('### SIMPLE EVAL: ###')
seq_batch, target_batch = get_batch(BATCH_SIZE)
feed = {sequence: seq_batch, target: target_batch}
prediction = sess.run([logits_stacked], feed)
for b in xrange(BATCH_SIZE):
print("{} -> {})".format(str(seq_batch[b, :, 0]), target_batch[b, :]))
print(" `-> Prediction: {}".format(prediction[0][b]))
示例输出如下所示:
### SIMPLE EVAL: ###
# [input seq] -> [target prediction]
# `-> Prediction: [model prediction]
[ 33. 53. 113. 213.] -> [ 353. 533. 753. 1013.])
`-> Prediction: [ 19.74548721 28.3149128 33.11489105 35.06603241]
[ -17. -32. -77. -152.] -> [-257. -392. -557. -752.])
`-> Prediction: [-16.38951683 -24.3657589 -29.49801064 -31.58583832]
[ -7. -4. 5. 20.] -> [ 41. 68. 101. 140.])
`-> Prediction: [ 14.14126873 22.74848557 31.29668617 36.73633194]
...
该模型是一个 LSTM 自动编码器,每个有 2 层。
不幸的是,正如您在结果中看到的那样,该模型没有正确学习序列。我可能只是在某个地方犯了一个严重的错误,或者 1000-10000 的训练步骤对于 LSTM 来说只是少数。正如我所说,我也刚刚开始正确地使用 understand/use LSTM。 但希望这能给你一些实施方面的灵感。
在阅读了几个 LSTM 介绍博客后,例如Jakob Aungiers',选项 3 似乎是无状态 LSTM 的正确选项。
如果您的 LSTM 需要比 num_steps
更早地记住数据,您可以以有状态的方式进行训练 - 有关 Keras 示例,请参见 Philippe Remy's blog post "Stateful LSTM in Keras"。然而,Philippe 没有展示批量大小大于 1 的示例。我想在你的情况下,有状态 LSTM 的批量大小为 4 可以与以下数据一起使用(写为 input -> label
):
batch #0:
1,2,3,4 -> 5
2,3,4,5 -> 6
3,4,5,6 -> 7
4,5,6,7 -> 8
batch #1:
5,6,7,8 -> 9
6,7,8,9 -> 10
7,8,9,10 -> 11
8,9,10,11 -> 12
batch #2:
9,10,11,12 -> 13
...
据此,例如批次 #0 中的第二个样本被正确地重复使用,以继续使用批次 #1 的第二个样本进行训练。
这在某种程度上类似于您的选项 4,但是您并没有使用那里的所有可用标签。
更新:
扩展我的建议,其中 batch_size
等于 num_steps
,Alexis Huet batch_size
being a divisor of num_steps
, which can be used for larger num_steps
. He describes it nicely 在他的博客上。