RNN 中的隐藏大小与输入大小
Hidden size vs input size in RNN
前提 1:
关于 RNN 层中的神经元 - 我的理解是 "each time step, every neuron receives both the input vector x (t) and the output vector from the previous time step y (t –1)" [1]:
前提 2:
也是我的理解,在Pytorch的GRU层中,input_size和hidden_size的意思如下:
- input_size – The number of expected features in the input x
- hidden_size – The number of features in the hidden state h
那么自然地,hidden_size应该代表一个GRU层的神经元数量
我的问题:
给定以下 GRU 层:
# assume that hidden_size = 3
class Encoder(nn.Module):
def __init__(self, src_dictionary_size, hidden_size):
super(Encoder, self).__init__()
self.embedding = nn.Embedding(src_dictionary_size, hidden_size)
self.gru = nn.GRU(input_size = hidden_size, hidden_size = hidden_size)
假设 hidden_size 为 3,我的理解是上面的 GRU 层将有 3 个神经元,每个神经元在每个时间步同时接受大小为 3 的输入向量。
我的问题是:为什么hidden_size和input_size的参数 必须相等? IE。为什么 3 个神经元中的每一个都不能接受大小为 5 的输入向量?
恰当的例子:以下两者都会产生大小不匹配:
self.gru = nn.GRU(input_size = hidden_size, hidden_size = hidden_size-1)
self.gru = nn.GRU(input_size = hidden_size, hidden_size = hidden_size+1)
[1] Géron, Aurélien。使用 Scikit-Learn 和 TensorFlow 进行机器学习实践(第 388 页)。奥莱利媒体。 Kindle版。
[3]https://pytorch.org/docs/stable/nn.html#torch.nn.GRU
为再现性添加完整代码:
import torch
import torch.nn as nn
class Encoder(nn.Module):
def __init__(self, src_dictionary_size, hidden_size):
super(Encoder, self).__init__()
self.hidden_size = hidden_size
self.embedding = nn.Embedding(src_dictionary_size, hidden_size)
self.gru = nn.GRU(input_size = hidden_size, hidden_size = hidden_size-1)
def forward(self, pad_seqs, seq_lengths, hidden):
"""
Args:
pad_seqs of shape (max_seq_length, batch_size, 1): Padded source sequences.
seq_lengths: List of sequence lengths.
hidden of shape (1, batch_size, hidden_size): Initial states of the GRU.
Returns:
outputs of shape (max_seq_length, batch_size, hidden_size): Padded outputs of GRU at every step.
hidden of shape (1, batch_size, hidden_size): Updated states of the GRU.
"""
embedded_sqs = self.embedding(pad_seqs).squeeze(2)
packed_sqs = pack_padded_sequence(embedded_sqs, seq_lengths)
packed_output, h_n = self.gru(packed_sqs, hidden)
output, input_sizes = pad_packed_sequence(packed_output)
return output, h_n
def init_hidden(self, batch_size=1):
return torch.zeros(1, batch_size, self.hidden_size)
def test_Encoder_shapes():
hidden_size = 5
encoder = Encoder(src_dictionary_size=5, hidden_size=hidden_size)
# maximum word count
max_seq_length = 4
# num sentences
batch_size = 2
hidden = encoder.init_hidden(batch_size=batch_size)
# these are padded sequences (sentences of words). There are 2 sentences (i.e. 2 batches) with a maximum of 4 words.
pad_seqs = torch.tensor([
[1, 2],
[2, 3],
[3, 0],
[4, 0]
]).view(max_seq_length, batch_size, 1)
outputs, new_hidden = encoder.forward(pad_seqs=pad_seqs, seq_lengths=[4, 2], hidden=hidden)
assert outputs.shape == torch.Size([4, batch_size, hidden_size]), f"Bad outputs.shape: {outputs.shape}"
assert new_hidden.shape == torch.Size([1, batch_size, hidden_size]), f"Bad new_hidden.shape: {new_hidden.shape}"
print('Success')
test_Encoder_shapes()
我刚刚解决了这个问题,错误是我自己造成的。
结论:input_size和hidden_size可以不同在大小上,这没有固有的问题。问题中的前提陈述正确。
上面(完整)代码的问题是 GRU 的初始隐藏状态没有正确的维度。初始隐藏状态必须与后续隐藏状态具有相同的维度。在我的例子中,初始隐藏状态的形状是 (1,2,5) 而不是 (1,2,4)。在前者中,5 表示嵌入向量的维数。 4代表GRU中的hidden_size(num neurons)。正确代码如下:
import torch
import torch.nn as nn
class Encoder(nn.Module):
def __init__(self, src_dictionary_size, input_size, hidden_size):
super(Encoder, self).__init__()
self.hidden_size = hidden_size
self.embedding = nn.Embedding(src_dictionary_size, input_size)
self.gru = nn.GRU(input_size = input_size, hidden_size = hidden_size)
def forward(self, pad_seqs, seq_lengths, hidden):
"""
Args:
pad_seqs of shape (max_seq_length, batch_size, 1): Padded source sequences.
seq_lengths: List of sequence lengths.
hidden of shape (1, batch_size, hidden_size): Initial states of the GRU.
Returns:
outputs of shape (max_seq_length, batch_size, hidden_size): Padded outputs of GRU at every step.
hidden of shape (1, batch_size, hidden_size): Updated states of the GRU.
"""
embedded_sqs = self.embedding(pad_seqs).squeeze(2)
packed_sqs = pack_padded_sequence(embedded_sqs, seq_lengths)
packed_output, h_n = self.gru(packed_sqs, hidden)
output, input_sizes = pad_packed_sequence(packed_output)
return output, h_n
def init_hidden(self, batch_size=1):
return torch.zeros(1, batch_size, self.hidden_size)
def test_Encoder_shapes():
hidden_size = 4
embedding_size = 5
encoder = Encoder(src_dictionary_size=5, input_size = embedding_size, hidden_size = hidden_size)
print(encoder)
max_seq_length = 4
batch_size = 2
hidden = encoder.init_hidden(batch_size=batch_size)
pad_seqs = torch.tensor([
[1, 2],
[2, 3],
[3, 0],
[4, 0]
]).view(max_seq_length, batch_size, 1)
outputs, new_hidden = encoder.forward(pad_seqs=pad_seqs, seq_lengths=[4, 2], hidden=hidden)
assert outputs.shape == torch.Size([4, batch_size, hidden_size]), f"Bad outputs.shape: {outputs.shape}"
assert new_hidden.shape == torch.Size([1, batch_size, hidden_size]), f"Bad new_hidden.shape: {new_hidden.shape}"
print('Success')
test_Encoder_shapes()
前提 1:
关于 RNN 层中的神经元 - 我的理解是 "each time step, every neuron receives both the input vector x (t) and the output vector from the previous time step y (t –1)" [1]:
前提 2:
也是我的理解,在Pytorch的GRU层中,input_size和hidden_size的意思如下:
- input_size – The number of expected features in the input x
- hidden_size – The number of features in the hidden state h
那么自然地,hidden_size应该代表一个GRU层的神经元数量
我的问题:
给定以下 GRU 层:
# assume that hidden_size = 3
class Encoder(nn.Module):
def __init__(self, src_dictionary_size, hidden_size):
super(Encoder, self).__init__()
self.embedding = nn.Embedding(src_dictionary_size, hidden_size)
self.gru = nn.GRU(input_size = hidden_size, hidden_size = hidden_size)
假设 hidden_size 为 3,我的理解是上面的 GRU 层将有 3 个神经元,每个神经元在每个时间步同时接受大小为 3 的输入向量。
我的问题是:为什么hidden_size和input_size的参数 必须相等? IE。为什么 3 个神经元中的每一个都不能接受大小为 5 的输入向量?
恰当的例子:以下两者都会产生大小不匹配:
self.gru = nn.GRU(input_size = hidden_size, hidden_size = hidden_size-1)
self.gru = nn.GRU(input_size = hidden_size, hidden_size = hidden_size+1)
[1] Géron, Aurélien。使用 Scikit-Learn 和 TensorFlow 进行机器学习实践(第 388 页)。奥莱利媒体。 Kindle版。
[3]https://pytorch.org/docs/stable/nn.html#torch.nn.GRU
为再现性添加完整代码:
import torch
import torch.nn as nn
class Encoder(nn.Module):
def __init__(self, src_dictionary_size, hidden_size):
super(Encoder, self).__init__()
self.hidden_size = hidden_size
self.embedding = nn.Embedding(src_dictionary_size, hidden_size)
self.gru = nn.GRU(input_size = hidden_size, hidden_size = hidden_size-1)
def forward(self, pad_seqs, seq_lengths, hidden):
"""
Args:
pad_seqs of shape (max_seq_length, batch_size, 1): Padded source sequences.
seq_lengths: List of sequence lengths.
hidden of shape (1, batch_size, hidden_size): Initial states of the GRU.
Returns:
outputs of shape (max_seq_length, batch_size, hidden_size): Padded outputs of GRU at every step.
hidden of shape (1, batch_size, hidden_size): Updated states of the GRU.
"""
embedded_sqs = self.embedding(pad_seqs).squeeze(2)
packed_sqs = pack_padded_sequence(embedded_sqs, seq_lengths)
packed_output, h_n = self.gru(packed_sqs, hidden)
output, input_sizes = pad_packed_sequence(packed_output)
return output, h_n
def init_hidden(self, batch_size=1):
return torch.zeros(1, batch_size, self.hidden_size)
def test_Encoder_shapes():
hidden_size = 5
encoder = Encoder(src_dictionary_size=5, hidden_size=hidden_size)
# maximum word count
max_seq_length = 4
# num sentences
batch_size = 2
hidden = encoder.init_hidden(batch_size=batch_size)
# these are padded sequences (sentences of words). There are 2 sentences (i.e. 2 batches) with a maximum of 4 words.
pad_seqs = torch.tensor([
[1, 2],
[2, 3],
[3, 0],
[4, 0]
]).view(max_seq_length, batch_size, 1)
outputs, new_hidden = encoder.forward(pad_seqs=pad_seqs, seq_lengths=[4, 2], hidden=hidden)
assert outputs.shape == torch.Size([4, batch_size, hidden_size]), f"Bad outputs.shape: {outputs.shape}"
assert new_hidden.shape == torch.Size([1, batch_size, hidden_size]), f"Bad new_hidden.shape: {new_hidden.shape}"
print('Success')
test_Encoder_shapes()
我刚刚解决了这个问题,错误是我自己造成的。
结论:input_size和hidden_size可以不同在大小上,这没有固有的问题。问题中的前提陈述正确。
上面(完整)代码的问题是 GRU 的初始隐藏状态没有正确的维度。初始隐藏状态必须与后续隐藏状态具有相同的维度。在我的例子中,初始隐藏状态的形状是 (1,2,5) 而不是 (1,2,4)。在前者中,5 表示嵌入向量的维数。 4代表GRU中的hidden_size(num neurons)。正确代码如下:
import torch
import torch.nn as nn
class Encoder(nn.Module):
def __init__(self, src_dictionary_size, input_size, hidden_size):
super(Encoder, self).__init__()
self.hidden_size = hidden_size
self.embedding = nn.Embedding(src_dictionary_size, input_size)
self.gru = nn.GRU(input_size = input_size, hidden_size = hidden_size)
def forward(self, pad_seqs, seq_lengths, hidden):
"""
Args:
pad_seqs of shape (max_seq_length, batch_size, 1): Padded source sequences.
seq_lengths: List of sequence lengths.
hidden of shape (1, batch_size, hidden_size): Initial states of the GRU.
Returns:
outputs of shape (max_seq_length, batch_size, hidden_size): Padded outputs of GRU at every step.
hidden of shape (1, batch_size, hidden_size): Updated states of the GRU.
"""
embedded_sqs = self.embedding(pad_seqs).squeeze(2)
packed_sqs = pack_padded_sequence(embedded_sqs, seq_lengths)
packed_output, h_n = self.gru(packed_sqs, hidden)
output, input_sizes = pad_packed_sequence(packed_output)
return output, h_n
def init_hidden(self, batch_size=1):
return torch.zeros(1, batch_size, self.hidden_size)
def test_Encoder_shapes():
hidden_size = 4
embedding_size = 5
encoder = Encoder(src_dictionary_size=5, input_size = embedding_size, hidden_size = hidden_size)
print(encoder)
max_seq_length = 4
batch_size = 2
hidden = encoder.init_hidden(batch_size=batch_size)
pad_seqs = torch.tensor([
[1, 2],
[2, 3],
[3, 0],
[4, 0]
]).view(max_seq_length, batch_size, 1)
outputs, new_hidden = encoder.forward(pad_seqs=pad_seqs, seq_lengths=[4, 2], hidden=hidden)
assert outputs.shape == torch.Size([4, batch_size, hidden_size]), f"Bad outputs.shape: {outputs.shape}"
assert new_hidden.shape == torch.Size([1, batch_size, hidden_size]), f"Bad new_hidden.shape: {new_hidden.shape}"
print('Success')
test_Encoder_shapes()