Pytorch：嵌入层后，无法获得 <class 'torch.Tensor'> 的 repr

Question

我是 PyTorch 的新手，正在尝试重现该项目：https://github.com/eXascaleInfolab/ActiveLink

但是feedforward()出现了错误，困扰我好几天了，这里是部分代码（模型的完整代码请看https://github.com/eXascaleInfolab/ActiveLink/blob/master/models.py）：

def forward(self, e1, rel, batch_size=None, weights=None):
......
        e1_embedded = self.emb_e(e1).view(-1, 1, 10, 20)
        rel_embedded = self.emb_rel(rel).view(-1, 1, 10, 20)
        stacked_inputs = torch.cat([e1_embedded, rel_embedded], 2)  # out: (128L, 1L, 20L, 20L)

这给了我错误（我正在使用 GPU）：

THCudaCheck FAIL file=/pytorch/aten/src/THC/generic/THCTensorMath.cu line=196 error=710 : device-side assert triggered
Traceback (most recent call last):
  File "main.py", line 147, in <module>
    main()
  File "main.py", line 136, in main
    model = run_meta_incremental(config, model, train_batcher, test_rank_batcher)
  File "/home/yonghui/yt/meta_incr_training.py", line 158, in run_meta_incremental
    g = run_inner(config, model, task)
  File "/home/yonghui/yt/meta_incr_training.py", line 120, in run_inner
    pred = model.forward(e1, rel)
  File "/home/yonghui/yt/models.py", line 136, in forward
    stacked_inputs = torch.cat([e1_embedded, rel_embedded], 2)
RuntimeError: cuda runtime error (710) : device-side assert triggered at /pytorch/aten/src/THC/generic/THCTensorMath.cu:196
/pytorch/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [189,0,0], thread: [0,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [189,0,0], thread: [1,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [189,0,0], thread: [2,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [189,0,0], thread: [3,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [189,0,0], thread: [4,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [189,0,0], thread: [5,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [189,0,0], thread: [6,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [189,0,0], thread: [7,0,0] Assertion `srcIndex < srcSelectDimSize` failed.

我使用 Debugger 试图找出问题所在：在e1和rel嵌入之前，它们都是int64中的张量，形状为torch.Size([128, 1])。

e1可以正常嵌入，转换为torch.float32和torch.Size([128, 1, 10, 20])。然而，在rel通过emb_rel的embedding层后，Debugger显示所有tenros为Unable to get repr for <class 'torch.Tensor'>.

这是怎么回事？我该如何解决？感谢您提供任何可能的帮助！！

Answer 1

错误出现在打印此错误消息之前的某处，可能是在重塑中。

调用视图不会改变底层数据，它只会改变它们的“视图”，而且是懒惰的。如果张量的不同视图是不可能的（例如，因为张量没有连续存储在内存中，参见PyTorch forum），它在应该使用张量内容的第一次场合失败，在你想要 debug-print 张量的情况。

为了调试，考虑将 view 替换为 reshape（参见）。

Answer 2

此问题已通过使用调试器并检查输入张量得到解决。

在embedding之前检查张量后，发现有些元素超出了范围，尤其是索引从0开始的情况。

Pytorch：嵌入层后，无法获得 <class 'torch.Tensor'> 的 repr

Pytorch: after embedding layer, Unable to get repr for <class 'torch.Tensor'>

python

embedding

pytorch