使用 tf.data.Dataset 作为 Keras 模型的训练输入不工作

Using tf.data.Dataset as training input to Keras model NOT working

我有一个简单的代码,它确实有效,用于使用 numpy 数组作为特征和标签在 Tensorflow 中训练 Keras 模型。如果我随后使用 tf.data.Dataset.from_tensor_slices 包装这些 numpy 数组以便使用 tensorflow 数据集训练相同的 Keras 模型,我会收到错误消息。我一直无法弄清楚为什么(它可能是一个 tensorflow 或 keras 错误,但我也可能遗漏了一些东西)。我在 python 3,tensorflow 是 1.10.0,numpy 是 1.14.5,不涉及 GPU。

OBS1:使用 tf.data.Dataset 作为 Keras 输入的可能性显示在 https://www.tensorflow.org/guide/keras,在“Input tf.data 数据集".

OBS2:在下面的代码中,正在执行“#Train with numpy arrays”下的代码,使用的是numpy数组。如果注释此代码并改用“#Train with tf.data datasets”下的代码,则会重现错误。

OBS3:第13行,注释掉,以“###WORKAROUND 1###”开头,如果去掉注释,该行用于tf.data.Dataset inputs, 错误改变了,尽管我不能完全理解为什么。

完整代码为:

import tensorflow as tf
import numpy as np

np.random.seed(1)
tf.set_random_seed(1)

print(tf.__version__)
print(np.__version__)

#Import mnist dataset as numpy arrays
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()#Import
x_train, x_test = x_train / 255.0, x_test / 255.0 #normalizing
###WORKAROUND 1###y_train, y_test = (y_train.astype(dtype='float32'), y_test.astype(dtype='float32'))

x_train = np.reshape(x_train, (x_train.shape[0], x_train.shape[1]*x_train.shape[2])) #reshaping 28 x 28 images to 1D vectors, similar to Flatten layer in Keras

batch_size = 32
#Create a tf.data.Dataset object equivalent to this data
tfdata_dataset_train = tf.data.Dataset.from_tensor_slices((x_train, y_train))
tfdata_dataset_train = tfdata_dataset_train.batch(batch_size).repeat()

#Creates model
keras_model = tf.keras.models.Sequential([
    tf.keras.layers.Dense(512, activation=tf.nn.relu),
    tf.keras.layers.Dropout(0.2, seed=1),
    tf.keras.layers.Dense(10, activation=tf.nn.softmax)
])

#Compile the model
keras_model.compile(optimizer='adam',
                    loss=tf.keras.losses.sparse_categorical_crossentropy,
                    metrics=['accuracy'])

#Train with numpy arrays
keras_training_history = keras_model.fit(x_train,
                y_train,
                initial_epoch=0,
                epochs=1,
                batch_size=batch_size
                )

#Train with tf.data datasets
#keras_training_history = keras_model.fit(tfdata_dataset_train,
#                initial_epoch=0,
#                epochs=1,
#                steps_per_epoch=60000//batch_size
#                )

print(keras_training_history.history)

使用tf.data.Dataset作为输入时观察到的错误是:

(...)
ValueError: Tensor conversion requested dtype uint8 for Tensor with dtype float32: 'Tensor("metrics/acc/Cast:0", shape=(?,), dtype=float32)'

During handling of the above exception, another exception occurred:

(...)
TypeError: Input 'y' of 'Equal' Op has type float32 that does not match type uint8 of argument 'x'.

去掉第13行的注释时的错误,如上文OBS3中的注释,是:

(...)
tensorflow.python.framework.errors_impl.InvalidArgumentError: In[0] is not a matrix
     [[Node: dense/MatMul = MatMul[T=DT_FLOAT, _class=["loc:@training/Adam/gradients/dense/MatMul_grad/MatMul_1"], transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_sequential_input_0_0, dense/MatMul/ReadVariableOp)]]

我们将不胜感激任何帮助,包括您能够重现错误的评论,以便我可以报告错误(如果是这种情况)。

我刚刚升级到 Tensorflow 1.10 来执行 this code. I think that is the answer which is also discussed in the other

此代码只有在我删除规范化时才会执行,因为该行似乎使用了太多 CPU 内存。我看到消息表明。我也减少了内核。

import tensorflow as tf
import numpy as np
from tensorflow.keras.layers import Conv2D, MaxPool2D, Flatten, Dense, Dropout, Input

np.random.seed(1)
tf.set_random_seed(1)

batch_size = 128
NUM_CLASSES = 10

print(tf.__version__)

(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
#x_train, x_test = x_train / 255.0, x_test / 255.0 #normalizing

def tfdata_generator(images, labels, is_training, batch_size=128):
    '''Construct a data generator using tf.Dataset'''

    def preprocess_fn(image, label):
        '''A transformation function to preprocess raw data
        into trainable input. '''
        x = tf.reshape(tf.cast(image, tf.float32), (28, 28, 1))
        y = tf.one_hot(tf.cast(label, tf.uint8), NUM_CLASSES)
        return x, y

    dataset = tf.data.Dataset.from_tensor_slices((images, labels))
    if is_training:
        dataset = dataset.shuffle(1000)  # depends on sample size

    # Transform and batch data at the same time
    dataset = dataset.apply(tf.contrib.data.map_and_batch(
        preprocess_fn, batch_size,
        num_parallel_batches=2,  # cpu cores
        drop_remainder=True if is_training else False))
    dataset = dataset.repeat()
    dataset = dataset.prefetch(tf.contrib.data.AUTOTUNE)

    return dataset

training_set = tfdata_generator(x_train, y_train,is_training=True, batch_size=batch_size)
testing_set  = tfdata_generator(x_test, y_test, is_training=False, batch_size=batch_size)

inputs = Input(shape=(28, 28, 1))
x = Conv2D(32, (3, 3), activation='relu', padding='valid')(inputs)
x = MaxPool2D(pool_size=(2, 2))(x)
x = Conv2D(64, (3, 3), activation='relu')(x)
x = MaxPool2D(pool_size=(2, 2))(x)
x = Flatten()(x)
x = Dense(512, activation='relu')(x)
x = Dropout(0.5)(x)
outputs = Dense(NUM_CLASSES, activation='softmax')(x)

keras_model =  tf.keras.Model(inputs, outputs)

#Compile the model
keras_model.compile('adam', 'categorical_crossentropy', metrics=['acc'])

#Train with tf.data datasets
keras_training_history = keras_model.fit(
                            training_set.make_one_shot_iterator(),
                            steps_per_epoch=len(x_train) // batch_size,
                            epochs=5,
                            validation_data=testing_set.make_one_shot_iterator(),
                            validation_steps=len(x_test) // batch_size,
                            verbose=1)
print(keras_training_history.history)

安装 tf-nightly build,同时更改一些张量的数据类型(安装 tf-nightly 后错误发生变化)解决了这个问题,所以这个问题(希望)将在 1.11 中得到解决。

相关material:https://github.com/tensorflow/tensorflow/issues/21894

I am wondering how Keras is able to do 5 epochs when the make_one_shot_iterator() which only supports iterating once through a dataset?

可以像 iterations = len(y_train) * epochs - here shown for tf.v1

来自 Mohan Radhakrishnan 的代码在 tf.v2 中仍然有效,对对象属于新 类 (在 tf.v2 中)修复 - 使代码 up-to-date... 不再需要 make_one_shot_iterator()

# >> author: Mohan Radhakrishnan

import tensorflow as tf
import tensorflow.keras
import numpy as np
from tensorflow.keras.layers import Conv2D, MaxPool2D, Flatten, Dense, Dropout, Input

np.random.seed(1)
tf.random.set_seed(1)

batch_size = 128
NUM_CLASSES = 10

print(tf.__version__)

(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
#x_train, x_test = x_train / 255.0, x_test / 255.0 #normalizing

def tfdata_generator(images, labels, is_training, batch_size=128):
    '''Construct a data generator using tf.Dataset'''

    def preprocess_fn(image, label):
        '''A transformation function to preprocess raw data
        into trainable input. '''
        x = tf.reshape(tf.cast(image, tf.float32), (28, 28, 1))
        y = tf.one_hot(tf.cast(label, tf.uint8), NUM_CLASSES)
        return x, y

    dataset = tf.data.Dataset.from_tensor_slices((images, labels))
    if is_training:
        dataset = dataset.shuffle(1000)  # depends on sample size

    # Transform and batch data at the same time
    dataset = dataset.apply( tf.data.experimental.map_and_batch(
        preprocess_fn, batch_size,
        num_parallel_batches=2,  # cpu cores
        drop_remainder=True if is_training else False))
    dataset = dataset.repeat()
    dataset = dataset.prefetch( tf.data.experimental.AUTOTUNE)

    return dataset

training_set = tfdata_generator(x_train, y_train,is_training=True, batch_size=batch_size)
testing_set  = tfdata_generator(x_test, y_test, is_training=False, batch_size=batch_size)

inputs = Input(shape=(28, 28, 1))
x = Conv2D(32, (3, 3), activation='relu', padding='valid')(inputs)
x = MaxPool2D(pool_size=(2, 2))(x)
x = Conv2D(64, (3, 3), activation='relu')(x)
x = MaxPool2D(pool_size=(2, 2))(x)
x = Flatten()(x)
x = Dense(512, activation='relu')(x)
x = Dropout(0.5)(x)
outputs = Dense(NUM_CLASSES, activation='softmax')(x)

keras_model =  tf.keras.Model(inputs, outputs)

#Compile the model
keras_model.compile('adam', 'categorical_crossentropy', metrics=['acc'])

#Train with tf.data datasets
# training_set.make_one_shot_iterator() - 'PrefetchDataset' object has no attribute 'make_one_shot_iterator'
keras_training_history = keras_model.fit(
                            training_set,
                            steps_per_epoch=len(x_train) // batch_size,
                            epochs=5,
                            validation_data=testing_set,
                            validation_steps=len(x_test) // batch_size,
                            verbose=1)
print(keras_training_history.history)

不在本地加载数据,只是简单的 DataFlow - 这非常方便 - 非常感谢 - 希望我的更正是正确的