为什么 torchvision.utils.make_grid() return 复制想要的网格？

Question

在下面的代码示例中，我无法理解为什么输出张量 grid 的形状为 3,28,280。我明白为什么它的高度为 28，宽度为 280，而不是 3。从运行 plt.imshow() 看来，所有 3 个 28x280 数组沿轴 0它们是相同的副本，因为打印其中任何一张都能得到我想要的图像。此外，我不明白为什么我可以将 grid 作为参数传递给 plt.imshow()，因为它应该采用 2D 数组，而不是 3D 数组，因为 grid 显然是。

import torch
import torchvision
import torchvision.transforms as transforms
import matplotlib.pyplot as plt
import numpy as np

train_set = torchvision.datasets.FashionMNIST(
    root = './pytorch_obj_classifier/data/FashionMNIST',
    train = True,
    download = True,
    transform = transforms.Compose([
            transforms.ToTensor()
    ])
)
sample = next(iter(train_loader))
image,label = sample
print(image.shape)

grid = torchvision.utils.make_grid(image,padding=0, nrow=10)
print(grid.shape)

plt.figure(figsize=(15,15))
grid = np.transpose(grid,(1,2,0))
grid1 = grid[:,:,0]
grid2 = grid[:,:,1]
grid3 = grid[:,:,2]
plt.imshow(grid1,cmap = 'gray')
plt.imshow(grid2,cmap = 'gray')
plt.imshow(grid3,cmap = 'gray')
plt.imshow(grid,cmap = 'gray')

Answer 1

MNIST dataset consists of grascale images. If you look at the implementation detail of torchvision.utils.make_grid，单通道图片复制通道3次：

if tensor.dim() == 4 and tensor.size(1) == 1:  # single-channel images
    tensor = torch.cat((tensor, tensor, tensor), 1)

至于matplotlib.pyplot.imshow它可以接受2D、3D或4D输入：

The image data. Supported array shapes are:

(M, N): an image with scalar data. The data is visualized using a colormap.

(M, N, 3): an image with RGB values (0-1 float or 0-255 int).

(M, N, 4): an image with RGBA values (0-1 float or 0-255 int), i.e. including transparency.

一般来说，我们不会提到维度，而是通过它们的形状（每个轴上的大小）来描述张量。在 PyTorch 中，图像始终具有三个轴，并且具有以下形状：(channel, height, width)。即使对于单通道图像：将其视为 3D 张量 (1, height, width) 而不是 2D 张量 (height, width)。这与您拥有多个通道的情况保持一致，这种情况经常（cf。卷积神经网络）。

为什么 torchvision.utils.make_grid() return 复制想要的网格？

Why does torchvision.utils.make_grid() return copies of the wanted grid?

python

python-3.x

pytorch

torchvision