如何在 Pytorch 中学习嵌入并在以后检索它

Question

我正在构建一个推荐系统，根据每个用户的商品购买历史，我可以预测他们的最佳商品。我有 userIDs 和 itemIDs 以及 userID 购买了多少 itemID。我有数百万用户和数以千计的产品。并非所有产品都已购买（有些产品还没有人购买）。由于用户和项目很大，我不想使用单热向量。我正在使用 pytorch，我想创建和训练嵌入，以便我可以对每个用户-项目对进行预测。我遵循了本教程 https://pytorch.org/tutorials/beginner/nlp/word_embeddings_tutorial.html。如果准确假设正在训练嵌入层，那么我是通过 model.parameters() 方法检索学习的权重还是应该使用 embedding.data.weight 选项？

Answer 1

model.parameters() returns 你 model 的所有 parameters，包括 embeddings。

所以你的model的所有这些parameters都交给了optimizer（下一行），稍后会训练调用 optimizer.step() - 所以是的，你的 embeddings 与网络的所有其他 parameters 一起训练。
（你也可以通过设置冻结某些层，即embedding.weight.requires_grad = False，但这里不是这样的）。

# summing it up:
# this line specifies which parameters are trained with the optimizer
# model.parameters() just returns all parameters
# embedding class weights are also parameters and will thus be trained
optimizer = optim.SGD(model.parameters(), lr=0.001)

您可以看到您的 嵌入权重 也是 Parameter 类型：

import torch
embedding_maxtrix = torch.nn.Embedding(10, 10)
print(type(embedding_maxtrix.weight))

这将输出权重的类型，即Parameter:

<class 'torch.nn.parameter.Parameter'>

我不完全确定检索是什么意思。你的意思是获取单个向量，还是只需要整个矩阵来保存它，或者做其他事情？

embedding_maxtrix = torch.nn.Embedding(5, 5)
# this will get you a single embedding vector
print('Getting a single vector:\n', embedding_maxtrix(torch.LongTensor([0])))
# of course you can do the same for a seqeunce
print('Getting vectors for a sequence:\n', embedding_maxtrix(torch.LongTensor([1, 2, 3])))
# this will give the the whole embedding matrix
print('Getting weights:\n', embedding_maxtrix.weight.data)

输出：

Getting a single vector:
 tensor([[-0.0144, -0.6245,  1.3611, -1.0753,  0.5020]], grad_fn=<EmbeddingBackward>)
Getting vectors for a sequence:
 tensor([[ 0.9277, -0.1879, -1.4999,  0.2895,  0.8367],
        [-0.1167, -2.2139,  1.6918, -0.3483,  0.3508],
        [ 2.3763, -1.3408, -0.9531,  2.2081, -1.5502]],
       grad_fn=<EmbeddingBackward>)
Getting weights:
 tensor([[-0.0144, -0.6245,  1.3611, -1.0753,  0.5020],
        [ 0.9277, -0.1879, -1.4999,  0.2895,  0.8367],
        [-0.1167, -2.2139,  1.6918, -0.3483,  0.3508],
        [ 2.3763, -1.3408, -0.9531,  2.2081, -1.5502],
        [-0.5829, -0.1918, -0.8079,  0.6922, -0.2627]])

我希望这能回答你的问题，你也可以看看文档，在那里你也可以找到一些有用的例子。

https://pytorch.org/docs/stable/nn.html#torch.nn.Embedding

如何在 Pytorch 中学习嵌入并在以后检索它

How to learn the embeddings in Pytorch and retrieve it later

embedding

collaborative-filtering

pytorch