LSTM/RNN pytorch中forward方法与训练模型的关系

Question

我对神经网络还是很陌生，对于以下任何含糊不清的地方，预先表示歉意。

在语言任务的“标准”LSTM 实现中，我们有以下内容（抱歉草图非常粗略）：

class LSTM(nn.Module):
    def __init__(*args):
    ...

    def forward(self, input, states):
         
        lstn_in = self.model['embed'](input)
        lstm_out, hidden = self.model['lstm'](lstm_in,states)

        return lstm_out, hidden

稍后，我们在训练步骤中调用此模型：

def train(*args):
      
    for epoch in range(epochs):
        ....
        *init_zero_states
        ...
        out, states = model(input, states)
        ...
    return model

这么说吧，我有 3 个句子作为输入：

sents = [[The, sun, is, shiny],
 [The, beach, was, very, windy],
 [Computer, broke, down, today]]

model = train(LSTM, sents)

所有句子中的所有单词都转换为嵌入并加载到模型中。

现在问题：

self.model['lstm']是否遍历所有文章中的所有单词并在每个单词后输出一个？还是每个句子？
模型是如何区分这3个句子的，比如得到"The", "sun", "is", "shiny"之后，do something (such as the states) in 'lstm' 重置并重新开始？
out, states = model(input, states) 之后训练步骤中的“输出”是运行所有 3 个句子之后的输出，因此是所有 3 个句子的组合“信息”？

谢谢！

Answer 1

在 Pytorch 中使用 LSTM 时，您通常会使用 nn.LSTM 函数。这是一个简单的例子，然后解释里面发生了什么：

class Model(nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        
        self.embedder = nn.Embedding(voab_size, embed_size)

        self.lstm = nn.LSTM(input_size, hidden_size, num_layers, batch_first=True)
        self.fc = nn.Linear(hidden_size, output_size)
        self.softmax = nn.Softmax(dim=1)

    def forward(self, x):
        x = self.embedder(x)
        
        # every time you pass a new sentence into the model you need to create
        # a new hidden-state (the LSTM requires, unlike RNNs, two hidden-states in a tuple)

        hidden = (torch.zeros(num_layers, batch_size, hidden_size), torch.zeros(num_layers, batch_size, hidden_size))
        x, hidden = self.lstm(x, hidden)
        
        # x contains the output states of every timestep, 
        # for classifiction we mostly just want the last one
        x = x[:, -1]

        x = self.fc(x)
        x = self.softmax(x)
        return x

因此，当查看 nn.LSTM 函数时，您会看到所有 N 个嵌入的单词都一次传递给它，并且您得到所有 N 个输出（每个时间步长一个）作为输出。这意味着在 lstm 函数内部，它迭代句子嵌入中的所有单词。我们只是在代码中看不到这一点。它还 returns 每个时间步长的隐藏状态，但您不必进一步使用它。在大多数情况下，您可以忽略它。

作为伪代码：

def lstm(x):
    hiddenstates = init_with_zeros()
    outputs, hiddenstates = [], []
    for e in x:
        output, hiddenstate = neuralnet(e, hiddenstate)
    
        outputs.append(output)
        hiddenstates.append(hiddenstate)

    return outputs, hiddenstates

sentence = ["the", "sun", "is", "shiny"]
sentence = embedding(sentence)

outputs, hiddenstates = lstm(sentence)

LSTM/RNN pytorch中forward方法与训练模型的关系

LSTM/RNN in pytorch The relation between forward method and training model

lstm

recurrent-neural-network

pytorch