RuntimeError: DataLoader worker (pid(s) 15876, 2756) exited unexpectedly

RuntimeError: DataLoader worker (pid(s) 15876, 2756) exited unexpectedly

我正在编译 PyTorch 教程网站上的一些现有示例。我特别在没有 GPU 的 CPU 设备上工作。

当 运行 程序显示以下错误类型时。它会变成我正在处理 CPU 设备或​​设置问题吗? raise RuntimeError('DataLoader worker (pid(s) {}) exited unexpectedly'.format(pids_str)) from e RuntimeError: DataLoader worker (pid(s) 15876, 2756) exited unexpectedly`。我该如何解决?

import torch
import torch.functional as F
import torch.nn as nn
import torch.optim as optim

import torchvision
import torchvision.transforms as transforms

import matplotlib.pyplot as plt
import numpy as np

from torch.utils.tensorboard import SummaryWriter
from torch.utils.data import DataLoader
from torchvision import datasets

device = 'cpu' if torch.cuda.is_available() else 'cuda'
print(device)

transform = transforms.Compose(
[transforms.ToTensor(),
 transforms.Normalize((0.5,), (0.5,))]
)
#Store separate training and validations splits in data
training_set = datasets.FashionMNIST(
 root='data',
 train=True,
 download=True,
 transform=transform
)
validation_set = datasets.FashionMNIST(
root='data',
train=False,
download=True,
transform=transform
)
training_loader = DataLoader(training_set, batch_size=4, shuffle=True, num_workers=2)
validation_loader = DataLoader(validation_set, batch_size=4, shuffle=False, num_workers=2)
classes = ('T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
    'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle Boot')


def matplotlib_imshow(img, one_channel=False):
  if one_channel:
     img = img.mean(dim=0)
img = img/2+0.5 #unnormalize
npimg = img.numpy()
if one_channel:
    plt.imshow(npimg, cmap="Greys")
else:
    plt.imshow(np.transpose(npimg, (1, 2, 0)))


dataiter = iter(training_loader)
images, labels = dataiter.next()

img_grid = torchvision.utils.make_grid(images)
matplotlib_imshow(img_grid, one_channel=True)

你需要先弄清楚dataLoader worker崩溃的原因。一个常见的原因是内存不足。您可以在脚本崩溃后通过 运行 dmesg -T 进行检查,看看系统是否终止了任何 python 进程。

设置num_workers=0 在 Windows 上,由于多处理限制,将 num_workers 设置为 > 0 会导致错误。这是预期的。

Github 上也有一个 issue: