torchvision 变换的不同结果

Question

如果我错了请纠正我。 'classic' 通过 torchvision transforms 传递图像的方法是在其文档页面中使用 Compose 。但是，这需要传递 Image 输入。另一种方法是使用 ConvertImageDtype 和 torch.nn.Sequential。这个'bypasses' Image 的需要，在我的例子中它要快得多，因为我使用 numpy 数组。

我的问题是结果不一样。下面是自定义 Normalize 的示例。我想使用 torch.nn.Sequential (tr) 因为它对我的需要来说更快，但与 Compose (tr2) 相比，误差非常大 (~810)。

from PIL import Image
import torchvision.transforms as T
import numpy as np
import torch

o = np.random.rand(64, 64, 3) * 255
o = np.array(o, dtype=np.uint8)
i = Image.fromarray(o)

tr = torch.nn.Sequential(
    T.Resize(224, interpolation=T.InterpolationMode.BICUBIC),
    T.CenterCrop(224),
    T.ConvertImageDtype(torch.float),
    T.Normalize([0.48145466, 0.4578275, 0.40821073], [0.26862954, 0.26130258, 0.27577711]),
)

tr2 = T.Compose([
    T.Resize(224, interpolation=T.InterpolationMode.BICUBIC),
    T.CenterCrop(224),
    T.ToTensor(),
    T.Normalize((0.48145466, 0.4578275, 0.40821073), (0.26862954, 0.26130258, 0.27577711)),
])

out = tr(torch.from_numpy(o).permute(2,0,1).contiguous())

out2 = tr2(i)

print(((out - out2) ** 2).sum())

插值方法似乎很重要，如果我使用默认的 BILINEAR，错误是 ~7，但我需要使用 BICUBIC。

问题似乎出在ConvertImageDtype vs ToTensor，因为如果我更换 ToTensor 与 ConvertImageDtype 结果相同（不能反过来因为 ToTensor 不是 Module 的子类，我不能将它与 nn.Sequential).

一起使用

但是，下面给出了相同的结果

tr = torch.nn.Sequential(
    T.ConvertImageDtype(torch.float),
)

tr2 = T.Compose([
    T.ToTensor(),
])

out = tr(torch.from_numpy(o).permute(2,0,1).contiguous())

out2 = tr2(i)

print(((out - out2) ** 2).sum())

这意味着插值改变了结果中的一些东西，这只重要当我使用 ToTensor 与 ConvertImageDtype.

欢迎任何意见。

Answer 1

这已记录在案 here:

The output image might be different depending on its type: when downsampling, the interpolation of PIL images and tensors is slightly different, because PIL applies antialiasing. This may lead to significant differences in the performance of a network. Therefore, it is preferable to train and serve a model with the same input types. See also below the antialias parameter, which can help making the output of PIL images and tensors closer.

通过 antialias=True 产生几乎相同的结果。这很有趣，因为文档说

it can be set to True for InterpolationMode.BILINEAR only mode.

但是，我正在使用 BICUBIC 并且仍然有效。

torchvision 变换的不同结果

Different results with torchvision transforms

transform

python-imaging-library

torchvision