为什么 torchvision.utils.make_grid() return 复制想要的网格?

Why does torchvision.utils.make_grid() return copies of the wanted grid?

在下面的代码示例中,我无法理解为什么输出张量 grid 的形状为 3,28,280。我明白为什么它的高度为 28,宽度为 280,而不是 3。从 运行 plt.imshow() 看来,所有 3 个 28x280 数组沿轴 0它们是相同的副本,因为打印其中任何一张都能得到我想要的图像。 此外,我不明白为什么我可以将 grid 作为参数传递给 plt.imshow(),因为它应该采用 2D 数组,而不是 3D 数组,因为 grid 显然是。

import torch
import torchvision
import torchvision.transforms as transforms
import matplotlib.pyplot as plt
import numpy as np

train_set = torchvision.datasets.FashionMNIST(
    root = './pytorch_obj_classifier/data/FashionMNIST',
    train = True,
    download = True,
    transform = transforms.Compose([
            transforms.ToTensor()
    ])
)
sample = next(iter(train_loader))
image,label = sample
print(image.shape)

grid = torchvision.utils.make_grid(image,padding=0, nrow=10)
print(grid.shape)

plt.figure(figsize=(15,15))
grid = np.transpose(grid,(1,2,0))
grid1 = grid[:,:,0]
grid2 = grid[:,:,1]
grid3 = grid[:,:,2]
plt.imshow(grid1,cmap = 'gray')
plt.imshow(grid2,cmap = 'gray')
plt.imshow(grid3,cmap = 'gray')
plt.imshow(grid,cmap = 'gray')

MNIST dataset consists of grascale images. If you look at the implementation detail of torchvision.utils.make_grid,单通道图片复制通道3次:

if tensor.dim() == 4 and tensor.size(1) == 1:  # single-channel images
    tensor = torch.cat((tensor, tensor, tensor), 1)

至于matplotlib.pyplot.imshow它可以接受2D、3D或4D输入:

The image data. Supported array shapes are:

  • (M, N): an image with scalar data. The data is visualized using a colormap.
  • (M, N, 3): an image with RGB values (0-1 float or 0-255 int).
  • (M, N, 4): an image with RGBA values (0-1 float or 0-255 int), i.e. including transparency.

一般来说,我们不会提到维度,而是通过它们的形状(每个轴上的大小)来描述张量。在 PyTorch 中,图像始终具有三个轴,并且具有以下形状:(channel, height, width)。即使对于单通道图像:将其视为 3D 张量 (1, height, width) 而不是 2D 张量 (height, width)。这与您拥有多个通道的情况保持一致,这种情况 经常 cf。卷积神经网络)。