来自不同阵列的跨行的 Pytorch 点积

Question

我正在尝试编写类似于变形金刚论文中位置编码的代码。为此，我需要执行以下操作：

对于以下三个矩阵，我想在行级连接它们（即每个矩阵的第一行堆叠在一起，第二行堆叠在一起，等等），然后在每个矩阵及其转置之间应用点积，最后，将它们压平并堆叠在一起。我将在以下示例中对此进行说明：

x = torch.tensor([[1,1,1,1],
                  [2,2,2,2],
                  [3,3,3,3]])
y = torch.tensor([[0,0,0,0],
                  [0,0,0,0],
                  [0,0,0,0]])
z = torch.tensor([[4,4,4,4],
                  [5,5,5,5],
                  [6,6,6,6]])

concat = torch.cat([x, y, z], dim=-1).view(-1, x.shape[-1])
print(concat)

tensor([[1, 1, 1, 1],
        [0, 0, 0, 0],
        [4, 4, 4, 4],
        [2, 2, 2, 2],
        [0, 0, 0, 0],
        [5, 5, 5, 5],
        [3, 3, 3, 3],
        [0, 0, 0, 0],
        [6, 6, 6, 6]])

# Here I get each three rows together, and then apply dot product, flatten, and stack them.
concat = torch.stack([
            torch.flatten(
                torch.matmul(
                    concat[i:i+3, :], # 3 is the number of tensors (x,y,z)
                    torch.transpose(concat[i:i+3, :], 0, 1))
                )
            for i in range(0, concat.shape[0], 3)
            ])

print(concat)

tensor([[  4,   0,  16,   0,   0,   0,  16,   0,  64],
        [ 16,   0,  40,   0,   0,   0,  40,   0, 100],
        [ 36,   0,  72,   0,   0,   0,  72,   0, 144]])

终于，我得到了我想要的最终矩阵。我的问题是，有没有办法像我在最后一步中那样使用循环来实现这一点？我希望一切都是张量。

Answer 1

你引入的循环只需要在那里获取数据的“切片列表”，这实际上与重塑它相同。您基本上是在引入一个额外的维度，其中有 3 个条目。基本上从形状 [n, k] 到 [n, 3, k].
要直接使用张量，您只需调用 .reshape 即可获得相同的形状。之后，您使用的其余代码也几乎完全相同。由于维度的变化，转置必须稍微改变。

总而言之，您想要的可以通过以下方式实现：

concat2 = concat.reshape((-1, 3, concat.shape[1]))
torch.flatten(
  torch.matmul(
    concat2, 
    concat2.transpose(1,2)
  ), 
  start_dim=1,
)

Answer 2

torch.einsum 使 matmul 您想要的轴更容易一些。

c = torch.concat([x, y, z], dim=-1).reshape(-1, *x.shape)
torch.einsum('ijl,ikl->ikj', c, c).reshape(3, -1)

输出

tensor([[  4,   0,  16,   0,   0,   0,  16,   0,  64],
        [ 16,   0,  40,   0,   0,   0,  40,   0, 100],
        [ 36,   0,  72,   0,   0,   0,  72,   0, 144]])

来自不同阵列的跨行的 Pytorch 点积

Pytorch dot product across rows from different arrays

python

matrix-multiplication

pytorch

tensor