LSTM 的预期隐藏状态维度没有考虑批量大小

Question

我有这个解码器模型，它应该将成批的句子嵌入（批量大小 = 50，隐藏大小 = 300）作为输入并输出一批预测句子的热表示：

class DecoderLSTMwithBatchSupport(nn.Module):
        # Your code goes here
        def __init__(self, embedding_size,batch_size, hidden_size, output_size):
            super(DecoderLSTMwithBatchSupport, self).__init__()
            self.hidden_size = hidden_size
            self.batch_size = batch_size
            self.lstm = nn.LSTM(input_size=embedding_size,num_layers=1, hidden_size=hidden_size, batch_first=True)
            self.out = nn.Linear(hidden_size, output_size)
            self.softmax = nn.LogSoftmax(dim=1)

        def forward(self, my_input, hidden):
            print(type(my_input), type(hidden))
            output, hidden = self.lstm(my_input, hidden)
            output = self.softmax(self.out(output[0]))
            return output, hidden

        def initHidden(self):
            return Variable(torch.zeros(1, self.batch_size, self.hidden_size)).cuda()

然而，当我运行它使用：

decoder=DecoderLSTMwithBatchSupport(vocabularySize,batch_size, 300, vocabularySize)
decoder.cuda()
decoder_input=np.zeros([batch_size,vocabularySize])
    for i in range(batch_size):
        decoder_input[i] = embeddings[SOS_token]
    decoder_input=Variable(torch.from_numpy(decoder_input)).cuda()
    decoder_hidden = (decoder.initHidden(),decoder.initHidden())
        for di in range(target_length):
            decoder_output, decoder_hidden = decoder(decoder_input.view(1,batch_size,-1), decoder_hidden)

我收到以下错误：

Expected hidden[0] size (1, 1, 300), got (1, 50, 300)

为了使模型期望批量隐藏状态，我缺少什么？

Answer 1

当您创建 LSTM 时，标志 batch_first 不是必需的，因为它假定您输入的形状不同。来自文档：

If True, then the input and output tensors are provided as (batch, seq, feature). Default: False

将 LSTM 创建更改为：

self.lstm = nn.LSTM(input_size=embedding_size, num_layers=1, hidden_size=hidden_size)

还有一个类型错误。当您使用 torch.from_numpy() 创建 decoder_input 时，它具有 dtype=torch.float64，而 decoder_input 默认具有 dtype=torch.float32。将创建 decoder_input 的行更改为

decoder_input = Variable(torch.from_numpy(decoder_input)).cuda().float()

通过这两项更改，它应该可以正常工作:)

Answer 2

更改 .view() 以反映 [1,批量大小，embedding_size] 作为第一个维度。

此外，您不需要初始化零张量，如果没有提供张量作为初始张量，pytorch 将使用零张量。

LSTM 的预期隐藏状态维度没有考虑批量大小

LSTM's expected hidden state dimensions doesn't take batch size into account

python-3.x

lstm

pytorch

batchsize