使用预处理函数更改 ImageDataGenerator 上的输入大小

Question

我希望对使用 ImageDataGenerator 加载的输入数据集进行 FFT。采用 FFT 将使通道数加倍，因为我将 FFT 复数输出的实部和复数部分沿通道维度堆叠在一起。 ImageDataGenerator class 的 preprocessing_function 属性应该输出一个与输入形状相同的 Numpy 张量，所以我无法使用它。我尝试直接在 ImageDataGenerator.flow_from_directory() 输出上应用 tf.math.fft2d，但它消耗了太多 RAM - 导致程序在 Google colab 上崩溃。我尝试的另一种方法是添加一个计算 FFT 的自定义层作为我的神经网络的第一层，但这会增加训练时间。所以我想把它作为一个预处理步骤来做。谁能建议一种在 ImageDataGenerator 上应用函数的有效方法。

Answer 1

您可以自定义 ImageDataGenerator，但我没有理由认为这比在第一层中使用它更快。这似乎是一项代价高昂的操作，因为 tf.signal.fft2d 需要 complex64 或 complex128 dtypes。所以它需要转换，然后再转换回来，因为神经网络权重是 tf.float32 而其他图像处理函数不采用 complex dtype。

import tensorflow as tf

labels = ['Cats', 'Dogs', 'Others']

def read_image(file_name):
  image = tf.io.read_file(file_name)
  image = tf.image.decode_jpeg(image, channels=3)
  image = tf.image.convert_image_dtype(image, tf.float32)
  image = tf.image.resize_with_pad(image, target_height=224, target_width=224)
  image = tf.cast(image, tf.complex64)
  image = tf.signal.fft2d(image)
  label = tf.strings.split(file_name, '\')[-2]
  label = tf.where(tf.equal(label, labels))
  return image, label

ds = tf.data.Dataset.list_files(r'path\to\my\pictures\*\*.jpg')

ds = ds.map(read_image)

next(iter(ds))

使用预处理函数更改 ImageDataGenerator 上的输入大小

Use preprocessing function that changes size of input on ImageDataGenerator

python

fft

keras

tensorflow