numpy.resize(image,(IMG_HEIGHT,IMG_WIDTH,3)) 中的 3 是多少？

Question

在尝试在 ML 中构建字母分类器时，这是使用 PIL 从文件夹中的图像创建图像数据和标签的代码。

def create_dataset_PIL(img_folder):

img_data_array=[]
class_name=[]
for dir1 in os.listdir(img_folder):
    print(dir1)
    for file in os.listdir(os.path.join(img_folder, dir1)):       
        image_path= os.path.join(img_folder, dir1,  file)
        image= np.array(Image.open(image_path))
        image= np.resize(image,(IMG_HEIGHT,IMG_WIDTH,3))
        image = image.astype('float32')
        image /= 255  
        img_data_array.append(image)
        class_name.append(dir1)
return img_data_array , class_name

每个图像在数据集中已经是 32 X 32 像素，我正在将其大小调整为 32 X 32 X 3 维度的列表。但是我不明白，当我只需要 32 X 32 像素时，这个第三维是什么？

我偶然发现了，在那里我了解到这可能是插值参数。同样从 YouTube 上，我了解到调整图像大小时需要进行插值。但是我不知道如何处理这些额外的数据？我的神经网络输入层的大小现在应该是 32 X 32 X 3 而不是 32 X 32 吗？

Answer 1

3 表示 RGB（红-绿-蓝）值。图像的每个像素由 3 个像素而不是一个像素表示。在黑白图像中，每个像素将由 [pixel] 表示，在 RGB 图像中，每个像素将由 [pixel(R),pixel(G),pixel(B)]

表示

事实上，图像的每个像素都有3个RGB值。这些范围在 0 到 255 之间，代表红色、绿色和蓝色的强度。较低的值代表较高的强度，较高的值代表较低的强度。例如，一个像素可以表示为这三个值 [78、136、60] 的列表。黑色将表示为 [0, 0, 0].

是的：你的输入层应该匹配这个 32X32X3。

Answer 2

Digital image contains information about color present on pixel at (x,y)coordinate in the image, also called as color channel 中的第 3 个维度。

最常见的渠道类型

RGB 模式：如果值为 3
例如：image_shape：[32,32,3]
灰度模式：如果值为1 例如：image_shape：[32,32,1]

如果您的 ML 模型不需要颜色特征，您可以使用 Scikit-image 通过 rgb2gray

转换为灰度

您可以了解有关 NumPy 中图像使用的更多信息here

numpy.resize(image,(IMG_HEIGHT,IMG_WIDTH,3)) 中的 3 是多少？

What is 3 in numpy.resize(image,(IMG_HEIGHT,IMG_WIDTH,3))?

python

interpolation

machine-learning

image-processing

neural-network