LSTM/RNN pytorch中forward方法与训练模型的关系
LSTM/RNN in pytorch The relation between forward method and training model
我对神经网络还是很陌生,对于以下任何含糊不清的地方,预先表示歉意。
在语言任务的“标准”LSTM 实现中,我们有以下内容(抱歉草图非常粗略):
class LSTM(nn.Module):
def __init__(*args):
...
def forward(self, input, states):
lstn_in = self.model['embed'](input)
lstm_out, hidden = self.model['lstm'](lstm_in,states)
return lstm_out, hidden
稍后,我们在训练步骤中调用此模型:
def train(*args):
for epoch in range(epochs):
....
*init_zero_states
...
out, states = model(input, states)
...
return model
这么说吧,我有 3 个句子作为输入:
sents = [[The, sun, is, shiny],
[The, beach, was, very, windy],
[Computer, broke, down, today]]
model = train(LSTM, sents)
所有句子中的所有单词都转换为嵌入并加载到模型中。
现在问题:
self.model['lstm']是否遍历所有文章中的所有单词并在每个单词后输出一个?还是每个句子?
模型是如何区分这3个句子的,比如得到"The", "sun", "is", "shiny"之后,do something (such as the states) in 'lstm' 重置并重新开始?
out, states = model(input, states)
之后训练步骤中的“输出”是 运行 所有 3 个句子之后的输出,因此是所有 3 个句子的组合“信息”?
谢谢!
在 Pytorch 中使用 LSTM 时,您通常会使用 nn.LSTM
函数。这是一个简单的例子,然后解释里面发生了什么:
class Model(nn.Module):
def __init__(self):
super(Model, self).__init__()
self.embedder = nn.Embedding(voab_size, embed_size)
self.lstm = nn.LSTM(input_size, hidden_size, num_layers, batch_first=True)
self.fc = nn.Linear(hidden_size, output_size)
self.softmax = nn.Softmax(dim=1)
def forward(self, x):
x = self.embedder(x)
# every time you pass a new sentence into the model you need to create
# a new hidden-state (the LSTM requires, unlike RNNs, two hidden-states in a tuple)
hidden = (torch.zeros(num_layers, batch_size, hidden_size), torch.zeros(num_layers, batch_size, hidden_size))
x, hidden = self.lstm(x, hidden)
# x contains the output states of every timestep,
# for classifiction we mostly just want the last one
x = x[:, -1]
x = self.fc(x)
x = self.softmax(x)
return x
因此,当查看 nn.LSTM
函数时,您会看到所有 N 个嵌入的单词都一次传递给它,并且您得到所有 N 个输出(每个时间步长一个)作为输出。这意味着在 lstm 函数内部,它迭代句子嵌入中的所有单词。我们只是在代码中看不到这一点。它还 returns 每个时间步长的隐藏状态,但您不必进一步使用它。在大多数情况下,您可以忽略它。
作为伪代码:
def lstm(x):
hiddenstates = init_with_zeros()
outputs, hiddenstates = [], []
for e in x:
output, hiddenstate = neuralnet(e, hiddenstate)
outputs.append(output)
hiddenstates.append(hiddenstate)
return outputs, hiddenstates
sentence = ["the", "sun", "is", "shiny"]
sentence = embedding(sentence)
outputs, hiddenstates = lstm(sentence)
我对神经网络还是很陌生,对于以下任何含糊不清的地方,预先表示歉意。
在语言任务的“标准”LSTM 实现中,我们有以下内容(抱歉草图非常粗略):
class LSTM(nn.Module):
def __init__(*args):
...
def forward(self, input, states):
lstn_in = self.model['embed'](input)
lstm_out, hidden = self.model['lstm'](lstm_in,states)
return lstm_out, hidden
稍后,我们在训练步骤中调用此模型:
def train(*args):
for epoch in range(epochs):
....
*init_zero_states
...
out, states = model(input, states)
...
return model
这么说吧,我有 3 个句子作为输入:
sents = [[The, sun, is, shiny],
[The, beach, was, very, windy],
[Computer, broke, down, today]]
model = train(LSTM, sents)
所有句子中的所有单词都转换为嵌入并加载到模型中。
现在问题:
self.model['lstm']是否遍历所有文章中的所有单词并在每个单词后输出一个?还是每个句子?
模型是如何区分这3个句子的,比如得到"The", "sun", "is", "shiny"之后,do something (such as the states) in 'lstm' 重置并重新开始?
out, states = model(input, states)
之后训练步骤中的“输出”是 运行 所有 3 个句子之后的输出,因此是所有 3 个句子的组合“信息”?
谢谢!
在 Pytorch 中使用 LSTM 时,您通常会使用 nn.LSTM
函数。这是一个简单的例子,然后解释里面发生了什么:
class Model(nn.Module):
def __init__(self):
super(Model, self).__init__()
self.embedder = nn.Embedding(voab_size, embed_size)
self.lstm = nn.LSTM(input_size, hidden_size, num_layers, batch_first=True)
self.fc = nn.Linear(hidden_size, output_size)
self.softmax = nn.Softmax(dim=1)
def forward(self, x):
x = self.embedder(x)
# every time you pass a new sentence into the model you need to create
# a new hidden-state (the LSTM requires, unlike RNNs, two hidden-states in a tuple)
hidden = (torch.zeros(num_layers, batch_size, hidden_size), torch.zeros(num_layers, batch_size, hidden_size))
x, hidden = self.lstm(x, hidden)
# x contains the output states of every timestep,
# for classifiction we mostly just want the last one
x = x[:, -1]
x = self.fc(x)
x = self.softmax(x)
return x
因此,当查看 nn.LSTM
函数时,您会看到所有 N 个嵌入的单词都一次传递给它,并且您得到所有 N 个输出(每个时间步长一个)作为输出。这意味着在 lstm 函数内部,它迭代句子嵌入中的所有单词。我们只是在代码中看不到这一点。它还 returns 每个时间步长的隐藏状态,但您不必进一步使用它。在大多数情况下,您可以忽略它。
作为伪代码:
def lstm(x):
hiddenstates = init_with_zeros()
outputs, hiddenstates = [], []
for e in x:
output, hiddenstate = neuralnet(e, hiddenstate)
outputs.append(output)
hiddenstates.append(hiddenstate)
return outputs, hiddenstates
sentence = ["the", "sun", "is", "shiny"]
sentence = embedding(sentence)
outputs, hiddenstates = lstm(sentence)