我可以使用 PyTorch Data Loader 加载保存在 CSV 文件中的原始数据图像吗?
can i use PyTorch Data Loader to load raw data images which are saved in CSV files?
我将原始数据图像保存在单独的 CSV 文件中(每个图像在一个文件中)。我想使用 PyTorch 在它们上训练 CNN。我应该如何加载数据以适合用作 CNN 的输入? (另外,它是1个通道,图像网络的输入默认是RGB)
PyTorch 的 DataLoader,顾名思义,只是一个实用程序 class,可帮助您并行加载数据、构建批处理、随机播放等,您需要的是自定义数据集实现。
忽略存储在 CSV 文件中的图像有点奇怪的事实,您只需要这样的东西:
from torch.utils.data import Dataset, DataLoader
class CustomDataset(Dataset):
def __init__(self, path: Path, ...):
# do some preliminary checks, e.g. your path exists, files are there...
assert path.exists()
...
# retrieve your files in some way, e.g. glob
self.csv_files = list(glob.glob(str(path / "*.csv")))
def __len__(self) -> int:
# this lets you know len(dataset) once you instantiate it
return len(self.csv_files)
def __getitem__(self, index: int) -> Any:
# this method is called by the dataloader, each index refers to
# a CSV file in the list you built in the constructor
csv = self.csv_files[index]
# now do whatever you need to do and return some tensors
image, label = self.load_image(csv)
return image, label
差不多就是这样。然后您可以创建数据集,将其传递给数据加载器并迭代数据加载器,例如:
dataset = CustomDataset(Path("path/to/csv/files"))
train_loader = DataLoader(dataset, shuffle=True, num_workers=8,...)
for batch in train_loader:
...
我将原始数据图像保存在单独的 CSV 文件中(每个图像在一个文件中)。我想使用 PyTorch 在它们上训练 CNN。我应该如何加载数据以适合用作 CNN 的输入? (另外,它是1个通道,图像网络的输入默认是RGB)
PyTorch 的 DataLoader,顾名思义,只是一个实用程序 class,可帮助您并行加载数据、构建批处理、随机播放等,您需要的是自定义数据集实现。
忽略存储在 CSV 文件中的图像有点奇怪的事实,您只需要这样的东西:
from torch.utils.data import Dataset, DataLoader
class CustomDataset(Dataset):
def __init__(self, path: Path, ...):
# do some preliminary checks, e.g. your path exists, files are there...
assert path.exists()
...
# retrieve your files in some way, e.g. glob
self.csv_files = list(glob.glob(str(path / "*.csv")))
def __len__(self) -> int:
# this lets you know len(dataset) once you instantiate it
return len(self.csv_files)
def __getitem__(self, index: int) -> Any:
# this method is called by the dataloader, each index refers to
# a CSV file in the list you built in the constructor
csv = self.csv_files[index]
# now do whatever you need to do and return some tensors
image, label = self.load_image(csv)
return image, label
差不多就是这样。然后您可以创建数据集,将其传递给数据加载器并迭代数据加载器,例如:
dataset = CustomDataset(Path("path/to/csv/files"))
train_loader = DataLoader(dataset, shuffle=True, num_workers=8,...)
for batch in train_loader:
...