Pytorch 大小不一致 pad_packed_sequence, seq2seq

Pytorch inconsistent size with pad_packed_sequence, seq2seq

我从这个 github 获得的编码器输出有些不一致。

编码器如下所示:

class Encoder(nn.Module):
    r"""Applies a multi-layer LSTM to an variable length input sequence.
    """

    def __init__(self, input_size, hidden_size, num_layers,
                 dropout=0.0, bidirectional=True, rnn_type='lstm'):
        super(Encoder, self).__init__()
        self.input_size = 40
        self.hidden_size = 512
        self.num_layers = 8
        self.bidirectional = True
        self.rnn_type = 'lstm'
        self.dropout = 0.0
        if self.rnn_type == 'lstm':
            self.rnn = nn.LSTM(input_size, hidden_size, num_layers,
                               batch_first=True,
                               dropout=dropout,
                               bidirectional=bidirectional)

    def forward(self, padded_input, input_lengths):
        """
        Args:
            padded_input: N x T x D
            input_lengths: N
        Returns: output, hidden
            - **output**: N x T x H
            - **hidden**: (num_layers * num_directions) x N x H
        """
        total_length = padded_input.size(1)  # get the max sequence length
        packed_input = pack_padded_sequence(padded_input, input_lengths,
                                            batch_first=True,enforce_sorted=False)
        packed_output, hidden = self.rnn(packed_input)
        pdb.set_trace()
        output, _ = pad_packed_sequence(packed_output, batch_first=True, total_length=total_length)
        return output, hidden

所以它只包含一个 rnn lstm 单元,如果我打印编码器,这就是输出:

LSTM(40, 512, num_layers=8, batch_first=True, bidirectional=True)

所以它应该有 512 大小的输出,对吗?但是当我输入一个大小为 torch.Size([16, 1025, 40]) 的张量时,16 个大小为 40 的 1025 个向量样本(打包以适合 RNN)时,我从 RNN 获得的输出具有新的编码大小 1024 torch.Size([16, 1025, 1024])什么时候应该编码为 512 对?

我有什么遗漏吗?

设置bidirectional=True使LSTM双向,这意味着将有两个LSTM,一个从左到右,另一个从右到左。

来自nn.LSTM documentation - Outputs

  • output of shape (seq_len, batch, num_directions * hidden_size): tensor containing the output features (h_t) from the last layer of the LSTM, for each t. If a torch.nn.utils.rnn.PackedSequence has been given as the input, the output will also be a packed sequence.

    For the unpacked case, the directions can be separated using output.view(seq_len, batch, num_directions, hidden_size), with forward and backward being direction 0 and 1 respectively. Similarly, the directions can be separated in the packed case.

由于使用双向 LSTM,您的输出大小为 [batch, seq_len, 2 * hidden_size](由于设置 batch_first=Truebatchseq_len 在您的案例中被交换)。将两者的输出连接起来以获得两者的信息,如果您想以不同的方式对待它们,可以轻松地将它们分开。