是否可以只冻结 pytorch 嵌入层中的某些嵌入权重?
Is it possible to freeze only certain embedding weights in the embedding layer in pytorch?
在 NLP 任务中使用 GloVe 嵌入时,数据集中的某些单词可能不存在于 GloVe 中。因此,我们为这些未知词实例化随机权重。
是否可以冻结从 GloVe 获得的权重,并只训练新实例化的权重?
我只知道我们可以设置:
model.embedding.weight.requires_grad = 假
但这使得新词无法训练..
或者是否有更好的方法来提取单词的语义..
1。将嵌入分成两个单独的对象
一种方法是使用两个单独的嵌入一个用于预训练,另一个用于待训练。
GloVe 应该被冻结,而没有预训练表示的那个将从可训练层中取出。
如果您将数据格式化为预训练令牌表示,它的范围比没有 GloVe 表示的令牌更小,则可以完成。假设您的预训练索引在 [0, 300] 范围内,而没有代表的是 [301, 500]。我会按照这些思路去做:
import numpy as np
import torch
class YourNetwork(torch.nn.Module):
def __init__(self, glove_embeddings: np.array, how_many_tokens_not_present: int):
self.pretrained_embedding = torch.nn.Embedding.from_pretrained(glove_embeddings)
self.trainable_embedding = torch.nn.Embedding(
how_many_tokens_not_present, glove_embeddings.shape[1]
)
# Rest of your network setup
def forward(self, batch):
# Which tokens in batch do not have representation, should have indices BIGGER
# than the pretrained ones, adjust your data creating function accordingly
mask = batch > self.pretrained_embedding.num_embeddings
# You may want to optimize it, you could probably get away without copy, though
# I'm not currently sure how
pretrained_batch = batch.copy()
pretrained_batch[mask] = 0
embedded_batch = self.pretrained_embedding(pretrained_batch)
# Every token without representation has to be brought into appropriate range
batch -= self.pretrained_embedding.num_embeddings
# Zero out the ones which already have pretrained embedding
batch[~mask] = 0
non_pretrained_embedded_batch = self.trainable_embedding(batch)
# And finally change appropriate tokens from placeholder embedding created by
# pretrained into trainable embeddings.
embedded_batch[mask] = non_pretrained_embedded_batch[mask]
# Rest of your code
...
假设您的预训练索引在 [0, 300] 范围内,而没有代表的是 [301, 500]。
2。指定标记的零梯度。
这个有点棘手,但我认为它非常简洁且易于实现。因此,如果您获得没有 GloVe 表示的标记的索引,您可以在反向传播之后显式地将它们的梯度归零,这样这些行就不会得到更新。
import torch
embedding = torch.nn.Embedding(10, 3)
X = torch.LongTensor([[1, 2, 4, 5], [4, 3, 2, 9]])
values = embedding(X)
loss = values.mean()
# Use whatever loss you want
loss.backward()
# Let's say those indices in your embedding are pretrained (have GloVe representation)
indices = torch.LongTensor([2, 4, 5])
print("Before zeroing out gradient")
print(embedding.weight.grad)
print("After zeroing out gradient")
embedding.weight.grad[indices] = 0
print(embedding.weight.grad)
第二种方法的输出:
Before zeroing out gradient
tensor([[0.0000, 0.0000, 0.0000],
[0.0417, 0.0417, 0.0417],
[0.0833, 0.0833, 0.0833],
[0.0417, 0.0417, 0.0417],
[0.0833, 0.0833, 0.0833],
[0.0417, 0.0417, 0.0417],
[0.0000, 0.0000, 0.0000],
[0.0000, 0.0000, 0.0000],
[0.0000, 0.0000, 0.0000],
[0.0417, 0.0417, 0.0417]])
After zeroing out gradient
tensor([[0.0000, 0.0000, 0.0000],
[0.0417, 0.0417, 0.0417],
[0.0000, 0.0000, 0.0000],
[0.0417, 0.0417, 0.0417],
[0.0000, 0.0000, 0.0000],
[0.0000, 0.0000, 0.0000],
[0.0000, 0.0000, 0.0000],
[0.0000, 0.0000, 0.0000],
[0.0000, 0.0000, 0.0000],
[0.0417, 0.0417, 0.0417]])
在 NLP 任务中使用 GloVe 嵌入时,数据集中的某些单词可能不存在于 GloVe 中。因此,我们为这些未知词实例化随机权重。
是否可以冻结从 GloVe 获得的权重,并只训练新实例化的权重?
我只知道我们可以设置: model.embedding.weight.requires_grad = 假
但这使得新词无法训练..
或者是否有更好的方法来提取单词的语义..
1。将嵌入分成两个单独的对象
一种方法是使用两个单独的嵌入一个用于预训练,另一个用于待训练。
GloVe 应该被冻结,而没有预训练表示的那个将从可训练层中取出。
如果您将数据格式化为预训练令牌表示,它的范围比没有 GloVe 表示的令牌更小,则可以完成。假设您的预训练索引在 [0, 300] 范围内,而没有代表的是 [301, 500]。我会按照这些思路去做:
import numpy as np
import torch
class YourNetwork(torch.nn.Module):
def __init__(self, glove_embeddings: np.array, how_many_tokens_not_present: int):
self.pretrained_embedding = torch.nn.Embedding.from_pretrained(glove_embeddings)
self.trainable_embedding = torch.nn.Embedding(
how_many_tokens_not_present, glove_embeddings.shape[1]
)
# Rest of your network setup
def forward(self, batch):
# Which tokens in batch do not have representation, should have indices BIGGER
# than the pretrained ones, adjust your data creating function accordingly
mask = batch > self.pretrained_embedding.num_embeddings
# You may want to optimize it, you could probably get away without copy, though
# I'm not currently sure how
pretrained_batch = batch.copy()
pretrained_batch[mask] = 0
embedded_batch = self.pretrained_embedding(pretrained_batch)
# Every token without representation has to be brought into appropriate range
batch -= self.pretrained_embedding.num_embeddings
# Zero out the ones which already have pretrained embedding
batch[~mask] = 0
non_pretrained_embedded_batch = self.trainable_embedding(batch)
# And finally change appropriate tokens from placeholder embedding created by
# pretrained into trainable embeddings.
embedded_batch[mask] = non_pretrained_embedded_batch[mask]
# Rest of your code
...
假设您的预训练索引在 [0, 300] 范围内,而没有代表的是 [301, 500]。
2。指定标记的零梯度。
这个有点棘手,但我认为它非常简洁且易于实现。因此,如果您获得没有 GloVe 表示的标记的索引,您可以在反向传播之后显式地将它们的梯度归零,这样这些行就不会得到更新。
import torch
embedding = torch.nn.Embedding(10, 3)
X = torch.LongTensor([[1, 2, 4, 5], [4, 3, 2, 9]])
values = embedding(X)
loss = values.mean()
# Use whatever loss you want
loss.backward()
# Let's say those indices in your embedding are pretrained (have GloVe representation)
indices = torch.LongTensor([2, 4, 5])
print("Before zeroing out gradient")
print(embedding.weight.grad)
print("After zeroing out gradient")
embedding.weight.grad[indices] = 0
print(embedding.weight.grad)
第二种方法的输出:
Before zeroing out gradient
tensor([[0.0000, 0.0000, 0.0000],
[0.0417, 0.0417, 0.0417],
[0.0833, 0.0833, 0.0833],
[0.0417, 0.0417, 0.0417],
[0.0833, 0.0833, 0.0833],
[0.0417, 0.0417, 0.0417],
[0.0000, 0.0000, 0.0000],
[0.0000, 0.0000, 0.0000],
[0.0000, 0.0000, 0.0000],
[0.0417, 0.0417, 0.0417]])
After zeroing out gradient
tensor([[0.0000, 0.0000, 0.0000],
[0.0417, 0.0417, 0.0417],
[0.0000, 0.0000, 0.0000],
[0.0417, 0.0417, 0.0417],
[0.0000, 0.0000, 0.0000],
[0.0000, 0.0000, 0.0000],
[0.0000, 0.0000, 0.0000],
[0.0000, 0.0000, 0.0000],
[0.0000, 0.0000, 0.0000],
[0.0417, 0.0417, 0.0417]])