LSTM/RNN pytorch中forward方法与训练模型的关系

LSTM/RNN in pytorch The relation between forward method and training model

我对神经网络还是很陌生,对于以下任何含糊不清的地方,预先表示歉意。

在语言任务的“标准”LSTM 实现中,我们有以下内容(抱歉草图非常粗略):

class LSTM(nn.Module):
    def __init__(*args):
    ...

    def forward(self, input, states):
         
        lstn_in = self.model['embed'](input)
        lstm_out, hidden = self.model['lstm'](lstm_in,states)

        return lstm_out, hidden

稍后,我们在训练步骤中调用此模型:

def train(*args):
      
    for epoch in range(epochs):
        ....
        *init_zero_states
        ...
        out, states = model(input, states)
        ...
    return model

这么说吧,我有 3 个句子作为输入:

sents = [[The, sun, is, shiny],
 [The, beach, was, very, windy],
 [Computer, broke, down, today]]
model = train(LSTM, sents)

所有句子中的所有单词都转换为嵌入并加载到模型中。

现在问题:

  1. self.model['lstm']是否遍历所有文章中的所有单词并在每个单词后输出一个?还是每个句子?

  2. 模型是如何区分这3个句子的,比如得到"The", "sun", "is", "shiny"之后,do something (such as the states) in 'lstm' 重置并重新开始?

  3. out, states = model(input, states) 之后训练步骤中的“输出”是 运行 所有 3 个句子之后的输出,因此是所有 3 个句子的组合“信息”?

谢谢!

在 Pytorch 中使用 LSTM 时,您通常会使用 nn.LSTM 函数。这是一个简单的例子,然后解释里面发生了什么:

class Model(nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        
        self.embedder = nn.Embedding(voab_size, embed_size)

        self.lstm = nn.LSTM(input_size, hidden_size, num_layers, batch_first=True)
        self.fc = nn.Linear(hidden_size, output_size)
        self.softmax = nn.Softmax(dim=1)

    def forward(self, x):
        x = self.embedder(x)
        
        # every time you pass a new sentence into the model you need to create
        # a new hidden-state (the LSTM requires, unlike RNNs, two hidden-states in a tuple)

        hidden = (torch.zeros(num_layers, batch_size, hidden_size), torch.zeros(num_layers, batch_size, hidden_size))
        x, hidden = self.lstm(x, hidden)
        
        # x contains the output states of every timestep, 
        # for classifiction we mostly just want the last one
        x = x[:, -1]

        x = self.fc(x)
        x = self.softmax(x)
        return x

因此,当查看 nn.LSTM 函数时,您会看到所有 N 个嵌入的单词都一次传递给它,并且您得到所有​​ N 个输出(每个时间步长一个)作为输出。这意味着在 lstm 函数内部,它迭代句子嵌入中的所有单词。我们只是在代码中看不到这一点。它还 returns 每个时间步长的隐藏状态,但您不必进一步使用它。在大多数情况下,您可以忽略它。

作为伪代码:

def lstm(x):
    hiddenstates = init_with_zeros()
    outputs, hiddenstates = [], []
    for e in x:
        output, hiddenstate = neuralnet(e, hiddenstate)
    
        outputs.append(output)
        hiddenstates.append(hiddenstate)

    return outputs, hiddenstates

sentence = ["the", "sun", "is", "shiny"]
sentence = embedding(sentence)

outputs, hiddenstates = lstm(sentence)