更改图像大小和范围

Change the image size and range

如果我有一张大小为 28 * 28 的图像并且我想将其大小调整为 32 *32,我想请求数据转换,我知道这可以用 transforms.Resize() 完成,但我'我知道怎么做。

同样对于归一化,如果我希望它在 [-1,1] 范围内,如果我希望它在 [0,1] 范围内,我之前使用 transforms.Normalize((0.485,0.456,0.406),(0.229,0.224,0.225))

Resize
此转换获取所需的输出形状作为构造函数的参数:

transform.Resize((32, 32))

Normalize
由于 Normalize 转换像 out <- (in - mu)/sig 一样工作,因此您有 musug 值将 out 投影到范围 [-1, 1]。为了投影到 [0,1],您需要乘以 0.5 并加 0.5。您可以玩弄方程式,发现您需要提供的新均值是 old_mean - old_sig,而新的 sigma 是 2 * old_sigma.
在你的情况下:

transforms.Normalize((0.256, 0.232, 0.181),(0.458, 0.448, 0.45))

如果要将 [0,1] 范围内的单个通道归一化为 [-1,1] 范围,则需要减去 .5 并除以 0.5:

transform.Normalize((.5),(.5))

别生气,不会有事的。将 MNIST 调整为 32x32 height x width 可以像这样完成:

import tempfile

import torchvision

dataset = torchvision.datasets.MNIST(
    root=tempfile.gettempdir(),
    download=True,
    train=True,
    # Simply put the size you want in Resize (can be tuple for height, width)
    transform=torchvision.transforms.Compose(
        [torchvision.transforms.Resize(32), torchvision.transforms.ToTensor()]
    ),
)

print(dataset[0][0].shape) # 1, 32, 32 (channels, width, height)

关于归一化,您可以查看 PyTorch 的每通道归一化源 here. It depends whether you want it per-channel or in another form, but something along those lines should work (see wikipedia 以获取归一化公式,这里是按通道应用的):

import dataclasses


@dataclasses.dataclass
class Normalize:
    maximum: typing.Tuple
    minimum: typing.Tuple
    low: int = -1
    high: int = 1

    def __call__(self, tensor):
        maximum = torch.as_tensor(self.maximum, dtype=dtype, device=tensor.device)
        minimum = torch.as_tensor(self.minimum, dtype=dtype, device=tensor.device)
        return self.low + (
            (tensor - minimum[:, None, None]) * (self.high - self.low)
        ) / (maximum[:, None, None] - minimum[:, None, None])

您必须提供 Tuple 的最小值和 Tuple 的最大值(每个通道一个值),就像标准 PyTorch 的 torchvision 规范化一样。您可以根据数据计算这些值,对于 MNIST,您可以这样计算它们:

def per_channel_op(data, op=torch.max):
    per_sample, _ = op(data, axis=0)
    per_width, _ = op(per_sample, axis=1)
    per_height, _ = op(per_width, axis=1)
    return per_height

# Unsqueeze to add superficial channel for MNIST
# Divide cause they are uint8 type by default
data = dataset.data.unsqueeze(1).float() / 255

# Maximum over samples
maximum = per_channel_op(data) # value per channel, here
minimum = per_channel_op(data, op=torch.min) # only one value cause MNIST

最后,在 MNIST 上应用归一化(注意,因为它们只有 -11 值,因为所有像素都是黑白的,在 CIFAR 等数据集上的表现会有所不同.):

dataset = torchvision.datasets.MNIST(
    root=tempfile.gettempdir(),
    download=True,
    train=True,
    # Simply put the size you want in Resize (can be tuple for height, width)
    transform=torchvision.transforms.Compose(
        [
            torchvision.transforms.Resize(32),
            torchvision.transforms.ToTensor(),
            # Apply with Lambda your custom transformation
            torchvision.transforms.Lambda(Normalize((maximum,), (minimum,))),
        ]
    ),
)