Pytorch 问题：当 num_workers > 0 时，我的 jupyter 卡住了

Question

这是我在 PyTorch 中的代码片段，当我使用 num_workers > 0 时我的 jupiter notebook 卡住了，我在这个问题上花了很多时间没有任何答案。我没有 GPU，我只使用 CPU.

class IndexedDataset(Dataset):

def __init__(self,data,targets, test=False):
    self.dataset = data 
    if not test:
        self.labels = targets.numpy()
        self.mask =  np.concatenate((np.zeros(NUM_LABELED), np.ones(NUM_UNLABELED)))


    def __len__(self):
        return len(self.dataset)

    def __getitem__(self, idx):
        image = self.dataset[idx]
        return image, self.labels[idx]
    
    def display(self, idx):
        plt.imshow(self.dataset[idx], cmap='gray')
        plt.show()

train_set = IndexedDataset(train_data, train_target, test = False)

test_set = IndexedDataset(test_data, test_target, test = True)

train_loader = DataLoader(train_set, batch_size=BATCH_SIZE, num_workers=2)

test_loader = DataLoader(test_set, batch_size=BATCH_SIZE, num_workers=2)

任何帮助，不胜感激。

Answer 1

当num_workers大于0时，PyTorch使用多进程进行数据加载。

Jupyter notebooks 存在已知的多处理问题。

解决这个问题的一种方法是不使用 Jupyter notebooks - 只需编写一个普通的 .py 文件并通过命令行运行它。

或尝试使用此处的建议：。

Answer 2

由于 jupyter Notebook 不支持 python 多处理，因此有两个精简库，您应该按照此处所述安装其中之一 1 and 2。

我更喜欢在不使用任何外部库的情况下通过两种方式解决我的问题：

通过将我的文件从 .ipynb 格式转换为 .py 格式并运行它在终端中，我在 main() 函数中编写代码如下：

...
...

train_set = IndexedDataset(train_data, train_target, test = False)

train_loader = DataLoader(train_set, batch_size=BATCH_SIZE, num_workers=4)

 if `__name__ ==  '__main__'`:
     for images,label in train_loader:
         print(images.shape)

多处理库如下：

在try.ipynb中：

import multiprocessing as mp
import processing as ps

...
...

train_set = IndexedDataset(train_data, train_target, test = False)
train_loader = DataLoader(train_set, batch_size=BATCH_SIZE)
    
if __name__=="__main__":
    p = mp.Pool(8)
    r = p.map(ps.getShape,train_loader) 
    print(r)
    p.close()

在 processing.py 文件中：

def getShape(data):
    for i in data:
        return i[0].shape

Pytorch 问题：当 num_workers > 0 时，我的 jupyter 卡住了

Pytorch Problem: My jupyter stuck when num_workers > 0

jupyter-notebook

pytorch

pytorch-dataloader