在图像上使用 skimage.transform.rescale 两次创建额外的通道

Using skimage.transform.rescale twice on an image creates additional channels

在我做的一个coursera指导项目中,讲师使用

from skimage.transform import rescale
image_rescaled = rescale(rescale(image,0.5),2.0)

扭曲图像。

在我自己的设备上发生的错误(在项目的 jupyter notebook 上没有出现,可能是由于模块版本和 python 的不同)是 image_rescaled 的频道数量增加了 1

例如 => images_normal.shape = (256,256,256,3)images_with_twice_reshape.shape=(256,256,256,4)

如果我使用 rescaled(rescale(image,2.0),0.5) 就不会出现这个问题。

这是更新版本的 python/skimage 还是我做错了什么?

更多参考资料(没有从源代码中删除任何内容,但用#s 突出显示了重要部分):

import os
import re
from scipy import ndimage, misc
from skimage.transform import resize, rescale
from matplotlib import pyplot
import numpy as np

def train_batches(just_load_dataset=False):

    batches = 256 # Number of images to have at the same time in a batch

    batch = 0 # Number if images in the current batch (grows over time and then resets for each batch)
    batch_nb = 0 # Batch current index
    
    ep = 4 # Number of epochs

    images = []
    x_train_n = []
    x_train_down = []
    
    x_train_n2 = [] # Resulting high res dataset
    x_train_down2 = [] # Resulting low res dataset
    
    for root, dirnames, filenames in os.walk("data/cars_train.nosync"):
        for filename in filenames:
            if re.search("\.(jpg|jpeg|JPEG|png|bmp|tiff)$", filename):
                filepath = os.path.join(root, filename)
                image = pyplot.imread(filepath)
                if len(image.shape) > 2:
                        
                    image_resized = resize(image, (256, 256)) # Resize the image so that every image is the same size
#########################
                    x_train_n.append(image_resized) # Add this image to the high res dataset
                    x_train_down.append(rescale(rescale(image_resized, 0.5), 2.0)) # Rescale it 0.5x and 2x so that it is a low res image but still has 256x256 resolution
########################
                    # >>>> x_train_down.append(rescale(rescale(image_resized, 2.0), 0.5)), this one works and gives the same shape of x_train_down and x_train_n.
########################
                    batch += 1
                    if batch == batches:
                        batch_nb += 1

                        x_train_n2 = np.array(x_train_n)
                        x_train_down2 = np.array(x_train_down)
                        
                        if just_load_dataset:
                            return x_train_n2, x_train_down2
                        
                        print('Training batch', batch_nb, '(', batches, ')')

                        autoencoder.fit(x_train_down2, x_train_n2,
                            epochs=ep,
                            batch_size=10,
                            shuffle=True,
                            validation_split=0.15)
                    
                        x_train_n = []
                        x_train_down = []
                    
                        batch = 0

    return x_train_n2, x_train_down2

通过上面的代码,我得到 x_train_n2.shape = (256,256,256,3)x_train_down2.shape=(256,256,256,4)

我能够按如下方式重现您的问题:

import numpy as np
from skimage.transform import resize, rescale

image = np.random.random((512, 512, 3))
resized = resize(image, (256, 256))
rescaled2x = rescale(
        rescale(resized, 0.5),
        2,
)
print(rescaled2x.shape)
# prints (256, 256, 4)

问题是 resize 可以推断出你的最终维度是 channels/RGB,因为你给它一个二维形状。另一方面,rescale 将您的数组视为形状为 (256, 256, 3) 的 3D 图像,它下降到 (128, 128, 2),也沿着颜色进行插值,就好像它们是另一个空间维度,然后上采样到 (256, 256, 4).

如果您查看 rescale documentation,您会找到“多通道”参数,描述为:

Whether the last axis of the image is to be interpreted as multiple channels or another spatial dimension.

所以,更新我的代码:

rescaled2x = rescale(
        rescale(resized, 0.5, multichannel=True),
        2,
        multichannel=True,
)
print(rescaled2x.shape)
# prints (256, 256, 3)