张量流中的字符级双向语言模型

character level bidirectional language model in tensorflow

受到Andrej Karpathy Char-RNN的启发,有一个char-rnn的Tensorflow实现sherjilozair/char-rnn-tensorflow: Multi-layer Recurrent Neural Networks (LSTM, RNN) for character-level language models in Python using Tensorflow. I want to implement bidirectional character level language model from this code. I change the model.py,写了一段简单的代码:

class Model:
def __init__(self, input_data, targets, seq_length=Config.max_seq_length, training=True):
    if Config.model == 'rnn':
        cell_fn = rnn.BasicRNNCell
    elif Config.model == 'gru':
        cell_fn = rnn.GRUCell
    elif Config.model == 'lstm':
        cell_fn = rnn.BasicLSTMCell
    elif Config.model == 'nas':
        cell_fn = rnn.NASCell
    else:
        raise Exception("model type not supported: {}".format(Config.model))

    fw_cells = []
    bw_cells = []
    for _ in range(Config.num_layers):
        fw_cell = cell_fn(Config.rnn_size)
        bw_cell = cell_fn(Config.rnn_size)
        fw_cells.append(fw_cell)
        bw_cells.append(bw_cell)

    self.fw_cell = rnn.MultiRNNCell(fw_cells, state_is_tuple=True)
    self.bw_cell = rnn.MultiRNNCell(bw_cells, state_is_tuple=True)

    self.input_data, self.targets = input_data, targets

    with tf.variable_scope('rnnlm'):
        softmax_w = tf.get_variable("softmax_w", [Config.rnn_size*2, Config.vocab_size])
        softmax_b = tf.get_variable("softmax_b", [Config.vocab_size])

    embedding = tf.get_variable("embedding", [Config.vocab_size, Config.rnn_size])
    inputs = tf.nn.embedding_lookup(embedding, self.input_data)

    inputs = tf.unstack(inputs, num=seq_length, axis=1)

    outputs, _, _ = tf.nn.static_bidirectional_rnn(self.fw_cell, self.bw_cell, inputs,
                                                   dtype=tf.float32, scope='rnnlm')
    output = tf.reshape(tf.concat(outputs, 1), [-1, Config.rnn_size*2])

    self.logits = tf.matmul(output, softmax_w) + softmax_b
    self.probs = tf.nn.softmax(self.logits)

    self.lr = tf.Variable(0.0, trainable=False)

    if training:
        loss = legacy_seq2seq.sequence_loss_by_example(
                [self.logits],
                [tf.reshape(self.targets, [-1])],
                [tf.sign(tf.cast(tf.reshape(self.targets, [-1]), dtype=tf.float32))])
        with tf.name_scope('cost'):
            self.cost = tf.reduce_mean(loss)
        tvars = tf.trainable_variables()
        grads, _ = tf.clip_by_global_norm(tf.gradients(self.cost, tvars), Config.grad_clip)

        with tf.name_scope('optimizer'):
            optimizer = tf.train.AdamOptimizer(self.lr)
        self.train_op = optimizer.apply_gradients(zip(grads, tvars))

在训练阶段,我看到收敛速度很快。经过近 3000 次迭代后,损失达到 0.003。在测试阶段,所有字符的概率都是1.0。我认为有一个错误。 我很高兴能得到一些帮助来找出我的错误。

您似乎设置了 self.lr = tf.Variable(0.0, trainable=False)。尝试将其更改为非零值。如果您在测试阶段从 self.probs 读取概率,则应适当地对其进行归一化,

使用前后输出来预测当前单词的概率。在您的情况下,您使用当前 rnn 输出来预测当前单词的概率。