PyTorch 在训练时崩溃:可能的图像解码错误、张量值、损坏的图像。 (运行时错误)

PyTorch crashes when training: probable image decoding error, tensor value, corrupt image. (Runtime Error)

前提

我对使用 PyTorch 还很陌生,在使用小型自定义数据集(10 张图像,90 class化)训练我的神经网络时,我经常遇到段错误。

下面的输出来自这些 运行 两次的打印语句(MNIST 数据集位于 idx 0,我的自定义数据集位于 idx 0)。这两个数据集都是使用格式完全相同(img_name、class)的 csv 文件编译的,图像目录 MNIST 子集的大小为 30,而我的自定义数据集的大小为 10:

example, label = dataset[0]
print(dataset[0])
print(example.shape)
print(label)

第一个张量是使用以下方法转换为张量的 MNIST 28X28 png:

image = torchvision.io.read_image().type(torch.FloatTensor)

这样我就有了一个可以比较的工作数据集。它使用与我拥有的自定义数据相同的自定义数据集 class。

神经网络 class 与我的自定义数据 NN 完全相同,只是它有 10 个输出,而不是我自定义数据的 90 个输出。

自定义数据大小不一,已使用下面列出的 transforms.Compose() 将其全部调整为 28 X 28。在这 10 个图像数据子集中,有尺寸为 800X170、96X66、64X34、208X66 等的图像...

第二个张量输出来自大小为 800 X 170 的 png。

在两个数据集上执行的 t运行sforms 完全相同:

tf=transforms.Compose([
        transforms.Resize(size = (28,28)),
        transforms.Normalize(mean=[-0.5/0.5],std=[1/0.5])
        ])

没有目标 t运行sforms 执行。

张量、张量大小、class 和 train/test 在结束时执行的输出

(tensor([[[  0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
            0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
            0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
            0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000],
         [  0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
            0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
            0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
            0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000],
         [  0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
            0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
            0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
            0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000],
         [  0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
            0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
            0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
            0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000],
         [  0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
            0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,  19.5000,
          119.0000,  54.0000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
            0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000],
         [  0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
            0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,  32.5000,
          127.0000,  93.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
            0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000],
         [  0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
            0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,  32.5000,
          127.0000, 106.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
            0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000],
         [  0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
            0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,  32.5000,
          127.0000, 106.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
            0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000],
         [  0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
            0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,  32.5000,
          127.0000, 106.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
            0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000],
         [  0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
            0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,  85.5000,
          127.5000, 107.0000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
            0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000],
         [  0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
            0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,  63.5000,
          127.0000, 106.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
            0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000],
         [  0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
            0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,  59.0000,
          127.0000,  58.0000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
            0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000],
         [  0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
            0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,  32.5000,
          127.0000,  66.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
            0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000],
         [  0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
            0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,  32.5000,
          127.0000, 106.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
            0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000],
         [  0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
            0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,  33.0000,
          128.0000, 107.0000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
            0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000],
         [  0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
            0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,  32.5000,
          127.0000,  88.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
            0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000],
         [  0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
            0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,  59.5000,
          127.0000,  54.0000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
            0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000],
         [  0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
            0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,  85.0000,
          127.0000,  54.0000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
            0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000],
         [  0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
            0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,  85.0000,
          127.0000,  54.0000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
            0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000],
         [  0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
            0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,  85.5000,
          128.0000,  54.0000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
            0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000],
         [  0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
            0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,  85.0000,
          127.0000,  54.0000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
            0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000],
         [  0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
            0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,  85.0000,
          127.0000,  60.0000,   8.0000,   0.5000,   0.5000,   0.5000,   0.5000,
            0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000],
         [  0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
            0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,  85.0000,
          127.0000, 127.5000,  84.5000,   0.5000,   0.5000,   0.5000,   0.5000,
            0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000],
         [  0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
            0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,  28.0000,
          118.5000,  65.5000,  14.5000,   0.5000,   0.5000,   0.5000,   0.5000,
            0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000],
         [  0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
            0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
            0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
            0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000],
         [  0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
            0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
            0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
            0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000],
         [  0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
            0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
            0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
            0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000],
         [  0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
            0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
            0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
            0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000]]]), 1)
torch.Size([1, 28, 28])
1

Train Epoch: 1 [0/25 (0%)]  Loss: -1.234500

Test set: Average loss: -1.6776, Accuracy: 1/5 (20%)

(tensor([[[68.1301, 67.3571, 68.4286, 67.9375, 69.5536, 69.2143, 69.0026,
          69.2283, 70.4464, 70.2857, 68.8839, 68.6071, 71.3214, 70.5102,
          71.0753, 71.9107, 71.5179, 71.5625, 73.6071, 71.9464, 73.2513,
          72.5804, 73.5000, 74.1429, 72.7768, 72.9107, 73.1786, 74.9069],
         [68.2028, 70.0714, 68.4821, 69.3661, 70.8750, 69.6607, 70.6569,
          70.2551, 70.9464, 70.3393, 70.3929, 71.3571, 71.1250, 72.1901,
          70.6850, 71.9464, 72.1071, 72.8304, 72.3036, 72.3214, 73.4528,
          73.4898, 72.4286, 73.0179, 73.1071, 73.5179, 73.0357, 74.0280],
         [71.3457, 70.4643, 70.4464, 70.7857, 70.6071, 71.9821, 71.6786,
          72.7564, 72.4107, 72.2321, 72.8571, 72.7321, 70.0357, 72.2640,
          73.8214, 72.8750, 73.0000, 73.0089, 74.8393, 74.1964, 74.9872,
          73.4248, 72.0179, 74.5357, 74.9018, 74.9821, 75.0357, 72.9286],
         [70.1429, 70.3750, 69.8750, 70.6250, 69.8750, 72.8750, 71.4107,
          71.5089, 73.3750, 73.2500, 74.4375, 73.8750, 73.0000, 74.4375,
          72.2768, 72.7500, 72.6250, 72.6250, 73.1250, 73.2500, 72.3571,
          73.0625, 72.5000, 74.8750, 73.6875, 74.2500, 75.2500, 73.7411],
         [53.1428, 56.1607, 57.4286, 58.3393, 60.6607, 59.3393, 62.2589,
          62.8380, 64.1250, 66.6429, 66.9821, 67.8750, 74.7679, 70.5192,
          68.7411, 69.3036, 66.0001, 67.9733, 67.4822, 68.3393, 68.3534,
          69.5740, 69.4465, 70.9465, 69.0983, 72.2679, 70.4286, 70.1493],
         [61.2143, 63.0000, 69.0357, 65.3393, 62.3214, 59.8036, 56.2730,
          54.5829, 52.8393, 52.8929, 50.8304, 52.9107, 66.4643, 69.6875,
          71.1849, 72.2678, 73.9821, 74.4643, 73.0357, 74.1250, 75.6492,
          76.2360, 75.7679, 75.6071, 75.2857, 74.9286, 74.8929, 75.1850],
         [54.9439, 62.5357, 69.7143, 72.0000, 71.2500, 74.1607, 75.9987,
          79.6416, 79.5179, 81.4822, 77.3214, 75.2143, 49.6071, 59.7513,
          71.4350, 74.4822, 73.5000, 73.8214, 72.2322, 73.7143, 73.9822,
          74.5893, 74.7322, 74.8572, 76.2947, 71.5714, 73.4822, 74.8533],
         [63.4298, 61.0357, 61.6072, 59.6697, 57.8036, 59.2322, 56.5982,
          57.2079, 55.3393, 56.3572, 56.5804, 58.7322, 79.7499, 73.1900,
          65.2423, 75.5357, 74.5356, 75.6250, 72.5893, 74.7321, 74.6135,
          75.8852, 75.6964, 75.7678, 76.4286, 74.2500, 74.7857, 76.1671],
         [63.7870, 60.3750, 67.5179, 67.5446, 66.7857, 66.2857, 66.4515,
          68.5089, 68.5714, 67.0714, 68.5982, 66.7678, 57.3929, 67.2806,
          68.9503, 72.9286, 74.0893, 73.4911, 74.2143, 73.3393, 72.4873,
          73.3916, 71.7500, 75.4821, 73.8393, 74.8750, 74.6429, 75.0906],
         [72.9260, 69.0178, 67.9643, 69.2321, 67.5178, 67.3750, 66.3814,
          64.8890, 63.8572, 64.9464, 66.9821, 66.3928, 63.0000, 64.7449,
          74.8800, 63.5178, 72.2143, 73.2321, 74.9286, 74.5893, 71.6938,
          74.8635, 73.9107, 75.5536, 75.8036, 76.2857, 76.3750, 75.2564],
         [72.1160, 69.5000, 72.0000, 69.4375, 71.2500, 70.5000, 72.3392,
          73.5982, 71.5000, 72.3750, 68.8750, 67.1249, 65.3750, 60.2856,
          61.6427, 65.3749, 67.4999, 65.0624, 70.4999, 69.4999, 65.3124,
          71.9107, 69.7499, 72.8750, 72.5625, 72.7500, 74.8750, 73.7053],
         [64.3763, 64.8571, 70.4642, 66.7857, 64.3214, 65.3928, 67.4859,
          68.7385, 67.8750, 67.8750, 71.0267, 72.8749, 67.5356, 59.4106,
          58.7625, 70.2319, 62.5534, 65.7141, 68.1249, 69.0713, 65.2013,
          72.8392, 67.1427, 71.7500, 72.8482, 72.6071, 74.4285, 74.0051],
         [69.7219, 71.8214, 67.4464, 68.6518, 66.0178, 66.1071, 65.5089,
          65.6964, 65.6964, 61.0714, 61.4375, 61.8214, 67.8214, 61.8762,
          57.3354, 66.8749, 63.8571, 60.3302, 62.9999, 67.8214, 68.9043,
          71.6365, 67.5357, 75.6250, 74.6518, 73.6071, 74.5178, 75.3877],
         [72.2857, 66.2857, 63.1964, 69.2232, 68.8214, 70.2857, 68.7895,
          70.2436, 70.1250, 66.8750, 69.9643, 66.0893, 52.8393, 60.3201,
          52.9273, 66.8571, 58.0535, 57.3035, 63.2321, 60.1785, 59.6058,
          69.9936, 69.4286, 73.4821, 72.7143, 72.8750, 72.7500, 74.0791],
         [65.7334, 56.6430, 60.7143, 67.8035, 66.5178, 65.8214, 67.6760,
          67.3061, 65.6964, 64.5893, 53.1430, 68.4820, 52.7676, 48.1604,
          48.1311, 65.3034, 51.9640, 61.8213, 59.6605, 57.3927, 54.6974,
          75.5752, 73.1250, 74.3928, 74.0446, 72.2142, 72.2857, 77.7806],
         [55.4095, 60.0893, 69.7142, 66.0892, 66.8750, 65.6607, 67.1926,
          66.3712, 63.0000, 56.9465, 41.6073, 48.6609, 61.8035, 39.7281,
          44.9195, 61.5892, 47.5891, 62.7678, 56.9641, 55.9820, 58.1236,
          70.0548, 70.3750, 69.8392, 68.1517, 72.0535, 76.5893, 65.4489],
         [60.6237, 66.5714, 67.8571, 65.7232, 66.2500, 67.6250, 66.9311,
          67.3303, 64.8214, 48.9644, 45.9019, 49.4108, 51.6608, 43.9259,
          47.5012, 38.9642, 37.5356, 66.0000, 65.5178, 49.3392, 57.3571,
          67.8252, 69.7678, 70.2143, 51.7410, 76.1607, 69.7143, 54.4056],
         [61.9643, 67.2500, 66.5000, 65.6875, 66.2500, 65.0000, 65.0625,
          65.5268, 63.7500, 49.8750, 50.4375, 53.1250, 38.7500, 25.3750,
          43.4286, 31.1250, 35.3750, 59.7500, 63.3750, 39.5000, 51.8125,
          58.6249, 69.5000, 70.1250, 48.0000, 75.8750, 48.7500, 61.4018],
         [67.8915, 65.7500, 66.3035, 66.5982, 66.0357, 64.9464, 65.4643,
          65.8074, 63.4643, 56.2325, 48.3306, 54.9467, 22.0715, 23.6990,
          29.0955, 27.3211, 29.4997, 57.8660, 68.2321, 36.9819, 50.7715,
          52.6707, 69.7143, 71.3392, 55.5534, 45.7855, 62.9463, 64.1556],
         [63.8431, 66.0893, 65.3571, 65.6161, 65.0893, 64.6964, 64.3444,
          65.1225, 62.9107, 57.4287, 57.3216, 54.9287, 26.4465, 30.5689,
          23.2499, 23.5534, 25.1605, 55.1071, 69.4643, 41.9642, 52.6619,
          59.8954, 72.0893, 79.7322, 47.2856, 64.5000, 52.9463, 81.6888],
         [64.2589, 69.9643, 71.5000, 75.2857, 77.6786, 78.6429, 76.2513,
          71.0089, 67.5536, 60.8929, 57.2501, 48.1072, 22.4821, 44.3316,
          17.5369, 24.3928, 22.8214, 45.4821, 67.8036, 35.4821, 43.7028,
          52.7806, 81.8929, 56.7321, 60.5357, 44.2321, 82.6964, 72.7500],
         [63.6748, 61.8929, 58.0001, 41.7859, 47.3037, 35.2502, 40.0525,
          63.9669, 76.1962, 74.6603, 67.2228, 43.3748, 19.9821, 37.0776,
          15.6544, 30.9823, 22.0182, 51.0984, 65.8215, 32.5717, 49.4747,
          39.5946, 49.5359, 55.7859, 40.7681, 81.7857, 76.0357, 73.2832],
         [60.0192, 53.6429, 43.5359, 44.8037, 39.9287, 48.8037, 48.3241,
          35.5882, 22.6071, 20.7142, 33.8838, 45.3570, 25.0714, 32.6657,
          26.8559, 22.9644, 27.7324, 69.4375, 62.5001, 33.9823, 48.6047,
          33.4811, 38.3930, 58.5358, 74.2857, 73.2679, 68.8572, 71.0817],
         [63.2500, 63.3393, 43.1608, 50.3751, 68.6786, 69.6429, 63.9324,
          65.5510, 59.6249, 54.3035, 40.5267, 20.6071, 32.1785, 31.9834,
          30.0791, 20.3036, 34.1073, 71.0000, 56.2322, 48.2501, 42.9695,
          37.1225, 53.7322, 68.3750, 76.2232, 72.4822, 70.6072, 72.9324],
         [63.1071, 64.1250, 65.7500, 41.7500, 26.2500, 25.6250, 25.1071,
          24.1339, 18.8750, 23.5000, 35.5625, 44.5000, 31.1250, 37.3393,
          28.3125, 23.6250, 39.3750, 67.1875, 60.7500, 53.2500, 41.6250,
          39.1339, 61.2500, 81.0000, 71.3125, 70.8750, 71.5000, 72.1339],
         [67.4796, 68.1429, 68.9821, 76.4286, 75.0893, 74.6250, 73.8419,
          72.7398, 58.4108, 44.3572, 33.2322, 19.8036, 32.6965, 29.7296,
          28.5957, 19.8750, 42.7499, 69.9196, 66.3214, 51.9285, 43.6848,
          44.9017, 64.2857, 73.2857, 71.7321, 71.4286, 73.9286, 73.5893],
         [67.7080, 67.9465, 68.0358, 69.1786, 69.1071, 69.7857, 69.0650,
          70.3635, 60.1247, 52.3744, 52.1690, 44.3031, 30.2678, 29.7014,
          20.1314, 25.4645, 45.8042, 74.2947, 63.4110, 56.0183, 49.2722,
          50.1485, 73.1251, 74.6608, 74.3036, 73.8572, 72.2322, 74.1570],
         [67.5868, 68.5179, 68.1786, 66.9018, 67.3215, 67.9822, 67.2628,
          65.4694, 49.2318, 43.7318, 39.5888, 47.7318, 29.2499, 28.3277,
          15.6326, 30.8215, 34.2502, 64.6428, 63.3572, 63.0001, 50.1688,
          51.6037, 77.5000, 75.8215, 73.7501, 74.9286, 74.3572, 74.6097]]]), 20)
torch.Size([1, 28, 28])
20
Train Epoch: 1 [0/8 (0%)]   Loss: -1.982941

Test set: Average loss: 0.0000, Accuracy: 0/2 (0%)

错误信息

此输出是在 运行 成功且没有段错误的情况下,段错误通常发生 5 次中的 4 次。当段错误确实发生时,它永远不会发生在处理 MNIST 子集时,它只会在尝试在数据集 [0] 或任何 1 或其中任何一个访问数据集,但如果我 运行 简单的打印语句在任何索引上足够多次,我至少可以让它输出一次而不是崩溃。 这是它更优雅地崩溃的情况(输出张量信息和 size/class,但在火车上崩溃:

torch.Size([1, 28, 28])
65
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py in _try_get_data(self, timeout)
    989         try:
--> 990             data = self._data_queue.get(timeout=timeout)
    991             return (True, data)

9 frames
/usr/lib/python3.7/queue.py in get(self, block, timeout)
    178                         raise Empty
--> 179                     self.not_empty.wait(remaining)
    180             item = self._get()

/usr/lib/python3.7/threading.py in wait(self, timeout)
    299                 if timeout > 0:
--> 300                     gotit = waiter.acquire(True, timeout)
    301                 else:

/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/signal_handling.py in handler(signum, frame)
     65         # Python can still get and update the process status successfully.
---> 66         _error_if_any_worker_fails()
     67         if previous_handler is not None:

RuntimeError: DataLoader worker (pid 1132) is killed by signal: Segmentation fault. 

The above exception was the direct cause of the following exception:

RuntimeError                              Traceback (most recent call last)
<ipython-input-9-02c9a53ca811> in <module>()
     68 
     69 if __name__ == '__main__':
---> 70     main()

<ipython-input-9-02c9a53ca811> in main()
     60 
     61     for epoch in range(1, args.epochs + 1):
---> 62         train(args, model, device, train_loader, optimizerAdadelta, epoch)
     63         test(model, device, test_loader)
     64         scheduler.step()

<ipython-input-6-93be0b7e297c> in train(args, model, device, train_loader, optimizer, epoch)
      2 def train(args, model, device, train_loader, optimizer, epoch):
      3     model.train()
----> 4     for batch_idx, (data, target) in enumerate(train_loader):
      5         data, target = data.to(device), target.to(device)
      6         optimizer.zero_grad()

/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py in __next__(self)
    519             if self._sampler_iter is None:
    520                 self._reset()
--> 521             data = self._next_data()
    522             self._num_yielded += 1
    523             if self._dataset_kind == _DatasetKind.Iterable and \

/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py in _next_data(self)
   1184 
   1185             assert not self._shutdown and self._tasks_outstanding > 0
-> 1186             idx, data = self._get_data()
   1187             self._tasks_outstanding -= 1
   1188             if self._dataset_kind == _DatasetKind.Iterable:

/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py in _get_data(self)
   1140         elif self._pin_memory:
   1141             while self._pin_memory_thread.is_alive():
-> 1142                 success, data = self._try_get_data()
   1143                 if success:
   1144                     return data

/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py in _try_get_data(self, timeout)
   1001             if len(failed_workers) > 0:
   1002                 pids_str = ', '.join(str(w.pid) for w in failed_workers)
-> 1003                 raise RuntimeError('DataLoader worker (pid(s) {}) exited unexpectedly'.format(pids_str)) from e
   1004             if isinstance(e, queue.Empty):
   1005                 return (False, None)

RuntimeError: DataLoader worker (pid(s) 1132) exited unexpectedly

但一般来说,这个问题 'crash for an unknown reasons' 出现了,这是我的日志在发生时的样子:

logs

我觉得是这样on/what我试过了

我认为张量信息和读取图像的方式有问题。我一次最多只能处理 40 张图像,因此 Google Colab 上的磁盘资源或 RAM 没有理由出现故障。我可能没有正确地规范化数据,我尝试了不同的值,但还没有解决它。也许图像已损坏?

我不太清楚会发生什么,否则我早就解决了。我认为我提供了充足的资源,使其成为该领域专家的一个突出问题。我花了很多时间在这个post上,我希望有人能够帮助我找到问题的根源。

如果我的代码以及我对网络和自定义数据集的使用有任何其他明显的问题,请告诉我,因为这是我第一次使用 PyTorch。

谢谢!

附加信息我不确定是否相关:

自定义数据集class:

# ------------ Custom Dataset Class ------------
class PhytoplanktonImageDataset(Dataset):
  def __init__(self, annotations_file, img_dir, transform, target_transform):
    self.img_labels = pd.read_csv(annotations_file) # Image name and label file loaded into img_labels
    self.img_dir = img_dir # directory to find all image names
    self.transform = transform # tranforms to apply to images
    self.target_transform = target_transform

  def __len__(self):
    return len(self.img_labels) # get length of csv file
  
  def __getitem__(self, idx):
    img_path = os.path.join(self.img_dir, self.img_labels.iloc[idx, 0]) 
    image = torchvision.io.read_image(path=img_path) 
    image = image.type(torch.FloatTensor)
    label = self.img_labels.iloc[idx,1]
    if self.transform:
      image = self.transform(image)
    if self.target_transform:
      label = self.target_transform(label)
    return image, label

NN class(唯一改变的是 nn.Linear() 有 10 个 MNIST 输出:

class NeuralNetwork(nn.Module):
    def __init__(self):
        super(NeuralNetwork, self).__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, 90),
            nn.ReLU()
        )
    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits

使用的参数:

args = parser.parse_args(['--batch-size', '64', '--test-batch-size', '64', 
                            '--epochs', '1', '--lr', '0.01', '--gamma', '0.7', '--seed','4', 
                            '--log-interval', '10'])

编辑:我能够在 运行 之一上优雅地获得以下退出(此回溯是进入 getitem 的一种方式 调用):

<ipython-input-3-ae5ff8635158> in __getitem__(self, idx)
     13     img_path = os.path.join(self.img_dir, self.img_labels.iloc[idx, 0]) # image path
     14     print(img_path)
---> 15     image = torchvision.io.read_image(path=img_path) # Reading image to 1 dimensional GRAY Tensor uint between 0-255
     16     image = image.type(torch.FloatTensor) # Now a FloatTensor (not a ByteTensor)
     17     label = self.img_labels.iloc[idx,1] # getting label from csv

/usr/local/lib/python3.7/dist-packages/torchvision/io/image.py in read_image(path, mode)
    258     """
    259     data = read_file(path)
--> 260     return decode_image(data, mode)

/usr/local/lib/python3.7/dist-packages/torchvision/io/image.py in decode_image(input, mode)
    237         output (Tensor[image_channels, image_height, image_width])
    238     """
--> 239     output = torch.ops.image.decode_image(input, mode.value)
    240     return output
    241 

RuntimeError: Internal error.

这是解码失败前正在打印的图像路径:/content/gdrive/My Drive/Colab Notebooks/all_images/sample_10/D20190926T145532_IFCB122_00013.png 这是该图像的样子: image

关于这张图片的信息:

颜色模型:灰色

深度:16

像素高度:50

像素宽度:80

图像 DPI:每英寸 72 像素

文件大小:3,557 字节

我建议看一下数据加载器中的 num workers 参数。 如果您的 num_workers 参数太高,可能会导致此错误。因此,我建议将其降低到零或直到您不再收到此错误为止。

萨萨克