在 pytorch 中同时迭代两个数据加载器时出现内存错误
Memory error when iterate over two dataloaders simultaneously in pytorch
我正在尝试使用来自 2 个不同数据集的 2 个数据加载器来训练我的模型。
我找到了如何使用 cycle() and zip()
进行设置,因为我的数据集与此处的长度不同:
File "/home/Desktop/example/train.py", line 229, in train_2
for i, (x1, x2) in enumerate(zip(cycle(train_loader_1), train_loader_2)):
File "/home/.conda/envs/3dcnn/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 346, in __next__
data = self.dataset_fetcher.fetch(index) # may raise StopIteration
File "/home/.conda/envs/3dcnn/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 47, in fetch
return self.collate_fn(data)
File "/home/.conda/envs/3dcnn/lib/python3.7/site-packages/torch/utils/data/_utils/collate.py", line 80, in default_collate
return [default_collate(samples) for samples in transposed]
File "/home/.conda/envs/3dcnn/lib/python3.7/site-packages/torch/utils/data/_utils/collate.py", line 80, in <listcomp>
return [default_collate(samples) for samples in transposed]
File "/home/.conda/envs/3dcnn/lib/python3.7/site-packages/torch/utils/data/_utils/collate.py", line 56, in default_collate
return torch.stack(batch, 0, out=out)
RuntimeError: [enforce fail at CPUAllocator.cpp:64] . DefaultCPUAllocator: can't allocate memory: you tried to allocate 154140672 bytes. Error code 12 (Cannot allocate memory)
我试图通过设置 num_workers=0
、减小批量大小、使用 pinned_memory=False
和 shuffle=False
来解决这个问题...
但是 none 它起作用了……我有 256GB 内存和 4 个 NVIDIA TESLA V100 GPU。
我尝试 运行 它只是不同时训练 2 个数据加载器,而是单独训练,并且它有效。但是对于我的项目,我需要使用 2 个数据集进行并行训练...
基于 this 讨论,而不是 cycle()
和 zip()
我通过使用避免任何错误:
try:
data, target = next(dataloader_iterator)
except StopIteration:
dataloader_iterator = iter(dataloader)
data, target = next(dataloader_iterator)
@srossi93 来自这个 pytorch post!
我正在尝试使用来自 2 个不同数据集的 2 个数据加载器来训练我的模型。
我找到了如何使用 cycle() and zip()
进行设置,因为我的数据集与此处的长度不同:
File "/home/Desktop/example/train.py", line 229, in train_2
for i, (x1, x2) in enumerate(zip(cycle(train_loader_1), train_loader_2)):
File "/home/.conda/envs/3dcnn/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 346, in __next__
data = self.dataset_fetcher.fetch(index) # may raise StopIteration
File "/home/.conda/envs/3dcnn/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 47, in fetch
return self.collate_fn(data)
File "/home/.conda/envs/3dcnn/lib/python3.7/site-packages/torch/utils/data/_utils/collate.py", line 80, in default_collate
return [default_collate(samples) for samples in transposed]
File "/home/.conda/envs/3dcnn/lib/python3.7/site-packages/torch/utils/data/_utils/collate.py", line 80, in <listcomp>
return [default_collate(samples) for samples in transposed]
File "/home/.conda/envs/3dcnn/lib/python3.7/site-packages/torch/utils/data/_utils/collate.py", line 56, in default_collate
return torch.stack(batch, 0, out=out)
RuntimeError: [enforce fail at CPUAllocator.cpp:64] . DefaultCPUAllocator: can't allocate memory: you tried to allocate 154140672 bytes. Error code 12 (Cannot allocate memory)
我试图通过设置 num_workers=0
、减小批量大小、使用 pinned_memory=False
和 shuffle=False
来解决这个问题...
但是 none 它起作用了……我有 256GB 内存和 4 个 NVIDIA TESLA V100 GPU。
我尝试 运行 它只是不同时训练 2 个数据加载器,而是单独训练,并且它有效。但是对于我的项目,我需要使用 2 个数据集进行并行训练...
基于 this 讨论,而不是 cycle()
和 zip()
我通过使用避免任何错误:
try:
data, target = next(dataloader_iterator)
except StopIteration:
dataloader_iterator = iter(dataloader)
data, target = next(dataloader_iterator)
@srossi93 来自这个 pytorch post!