紧接着带有偏差的嵌入层与 PyTorch 中的线性层有什么区别

Question

我正在阅读“Deep Learning for Coders with fastai & PyTorch”一书。我仍然对嵌入模块的作用感到困惑。这似乎是一个简短而简单的网络，除了我似乎无法毫无偏见地理解嵌入与线性的不同之处。我知道它做一些更快的点积计算版本，其中一个矩阵是单热编码矩阵，另一个是嵌入矩阵。它这样做实际上是为了 select 一条数据？请指出我哪里错了。这是书中显示的简单网络之一。

class DotProduct(Module):
    def __init__(self, n_users, n_movies, n_factors):
        self.user_factors = Embedding(n_users, n_factors)
        self.movie_factors = Embedding(n_movies, n_factors)
        
    def forward(self, x):
        users = self.user_factors(x[:,0])
        movies = self.movie_factors(x[:,1])
        return (users * movies).sum(dim=1)

Answer 1

嵌入

[...] what Embedding does differently than Linear without a bias.

基本上一切。 torch.nn.Embedding 是查找 table；本质上与 torch.Tensor 相同，但有一些变化（例如可以在指定索引处使用稀疏嵌入或默认值）。

例如：

import torch

embedding = torch.nn.Embedding(3, 4)

print(embedding.weight)

print(embedding(torch.tensor([1])))

会输出：

Parameter containing:
tensor([[ 0.1420, -0.1886,  0.6524,  0.3079],
        [ 0.2620,  0.4661,  0.7936, -1.6946],
        [ 0.0931,  0.3512,  0.3210, -0.5828]], requires_grad=True)
tensor([[ 0.2620,  0.4661,  0.7936, -1.6946]], grad_fn=<EmbeddingBackward>)

所以我们基本上采用了嵌入的第一行。仅此而已。

用在什么地方？

通常当我们想要为每一行编码一些含义（如 word2vec）时（例如，语义上接近的词在欧几里德中接近 space）并可能训练它们

线性

torch.nn.Linear（无偏差）也是一个torch.Tensor（权重）但是它对其（和输入）进行操作，本质上是：

output = input.matmul(weight.t())

每次调用图层时（参见source code and functional definition of this layer）。

代码片段

您的代码片段中的图层基本上是这样做的：

在 __init__
使用形状 (batch_size, 2) 的输入调用图层：
- 第一列包含用户嵌入的索引
- 第二列包含电影嵌入的索引
这些嵌入被相乘并相加 returning (batch_size,)（所以它不同于 nn.Linear return (batch_size, out_features) 并且做点积而不是按元素乘法，然后像这里一样求和）

这可能用于为某些类似推荐的系统训练（用户和电影的）表示。

其他内容

I know it does some faster computational version of a dot product where one of the matrices is a one-hot encoded matrix and the other is the embedding matrix.

不，不是。 torch.nn.Embedding 可以是一种热编码 并且也可能是稀疏的，但取决于算法（以及这些算法是否支持稀疏性），会有加速或不会。

紧接着带有偏差的嵌入层与 PyTorch 中的线性层有什么区别

What is the difference between an Embedding Layer with a bias immediately afterwards and a Linear Layer in PyTorch

python

oop

deep-learning

pytorch

fast-ai

嵌入

用在什么地方？

线性

代码片段

其他内容