Pytorch 的 DataLoader 中的 sampler 参数

Question

在使用 Pytorch 的 DataLoader 实用程序时，在 sampler 中 RandomIdentitySampler 的目的是什么？在 RandomIdentitySampler 中有一个参数 instances。 instances 是否取决于工人的数量？如果有 4 个 worker 那么是否也应该有 4 个实例？

以下是代码块：

c_dataloaders = DataLoader(Preprocessor(cluster_dataset.train_set,
                                        root=cluster_dataset.images_dir,
                                        transform=train_transformer),
                                        batch_size=args.batch_size_stage2,
                                        num_workers=args.workers,
                                        sampler=RandomIdentitySampler(cluster_dataset.train_set,
                                        args.batch_size_stage2,
                                        args.instances)

Answer 1

此采样器不是 PyTorch 或任何其他官方库（torchvision、torchtext 等）的一部分。反正从KaiyangZhou的torchreid里有个RandomIdentitySampler。假设是这样：

While using Pytorch's DataLoader utility, in sampler what is the purpose of RandomIdentitySampler?

正如您在 DataLoader documentation: the sampler "defines the strategy to draw samples from the dataset". More specifically, based on RandomIdentitySampler documentation 中看到的那样，它“随机采样 N 个身份，每个身份都有 K 个实例”。

And in RandomIdentitySampler there is an argument instances. Does instances depends upon number of workers?

根据之前的回答，您可以注意到 instances 不依赖于工人的数量。它只是设置将从每个批次的数据集中提取的每个身份的实例数。

If there are 4 workers then should there be 4 instances as well?

不一定。唯一的constraint就是实例数不能小于batch size

Pytorch 的 DataLoader 中的 sampler 参数

sampler argument in DataLoader of Pytorch

python

pytorch

pytorch-dataloader