ValueError: None values not supported. Code working properly on CPU/GPU but not on TPU

ValueError: None values not supported. Code working properly on CPU/GPU but not on TPU

我正在尝试训练一个 seq2seq 语言翻译模型,我正在从这个 Kaggle Notebook on Google Colab. The code is working fine with CPU and GPU, but it is giving me errors while training on a TPU. This same question has been already asked here.

中复制粘贴代码

这是我的代码:

    strategy = tf.distribute.experimental.TPUStrategy(resolver)
    
    with strategy.scope():
      model = create_model()
      model.compile(optimizer = 'rmsprop', loss = 'categorical_crossentropy')
    
    model.fit_generator(generator = generate_batch(X_train, y_train, batch_size = batch_size),
                        steps_per_epoch = train_samples // batch_size,
                        epochs = epochs,
                        validation_data = generate_batch(X_test, y_test, batch_size = batch_size),
                        validation_steps = val_samples // batch_size)

回溯:

Epoch 1/2
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-60-940fe0ee3c8b> in <module>()
      3                     epochs = epochs,
      4                     validation_data = generate_batch(X_test, y_test, batch_size = batch_size),
----> 5                     validation_steps = val_samples // batch_size)

10 frames
/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/func_graph.py in wrapper(*args, **kwargs)
    992           except Exception as e:  # pylint:disable=broad-except
    993             if hasattr(e, "ag_error_metadata"):
--> 994               raise e.ag_error_metadata.to_exception(e)
    995             else:
    996               raise

ValueError: in user code:
    /usr/local/lib/python3.7/dist-packages/keras/engine/training.py:853 train_function  *
    return step_function(self, iterator)
    /usr/local/lib/python3.7/dist-packages/keras/engine/training.py:842 step_function  **
    outputs = model.distribute_strategy.run(run_step, args=(data,))
...
ValueError: None values not supported.

我无法找出错误,我认为错误是因为这个 generate_batch 函数:

X, y = lines['english_sentence'], lines['hindi_sentence']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 34)

def generate_batch(X = X_train, y = y_train, batch_size = 128):
    while True:
        for j in range(0, len(X), batch_size):
 
            encoder_input_data = np.zeros((batch_size, max_length_src), dtype='float32')
            decoder_input_data = np.zeros((batch_size, max_length_tar), dtype='float32')
            decoder_target_data = np.zeros((batch_size, max_length_tar, num_decoder_tokens), dtype='float32')
            
            for i, (input_text, target_text) in enumerate(zip(X[j:j + batch_size], y[j:j + batch_size])):
                for t, word in enumerate(input_text.split()):
                    encoder_input_data[i, t] = input_token_index[word]
                for t, word in enumerate(target_text.split()):
                    if t<len(target_text.split())-1:
                        decoder_input_data[i, t] = target_token_index[word]
                    if t>0:

                        decoder_target_data[i, t - 1, target_token_index[word]] = 1.
            yield([encoder_input_data, decoder_input_data], decoder_target_data)

我的 Colab 笔记本 - here
Kaggle 数据集 - here
TensorFlow 版本 - 2.6

编辑 - 请不要告诉我将 TensorFlow/Keras 版本降级到 1.x。我可以将其降级为 TensorFlow 2.0, 2.1, 2.3,但不能降级为 1.x。我不明白TensorFlow 1.x。此外,使用 3 年前的版本没有意义。

需要降级到 Keras 1.0.2 如果有效那就太好了,否则我会告诉其他解决方案。

您需要更新 Keras,您的问题将得到解决

如您提供的 link 中的参考答案所述,tensorflow.data API 与 TPU 配合使用效果更好。为了适应您的情况,请尝试在 generate_batch 函数中使用 return 而不是 yield

def generate_batch(X = X_train, y = y_train, batch_size = 128):
    ...
    return encoder_input_data, decoder_input_data, decoder_target_dat

encoder_input_data, decoder_input_data, decoder_target_data = generate_batch(X_train, y_train, batch_size=128)

然后使用 tensorflow.data 构建您的数据:

from tensorflow.data import Dataset

encoder_input_data = Dataset.from_tensor_slices(encoder_input_data)
decoder_input_data = Dataset.from_tensor_slices(decoder_input_data)
decoder_target_data = Dataset.from_tensor_slices(decoder_target_data)
ds = Dataset.zip((encoder_input_data, decoder_input_data, decoder_target_data)).map(map_fn).batch(1024)

其中 map_fn 定义为:

def map_fn(encoder_input ,decoder_input, decoder_target):
    return (encoder_input ,decoder_input), decoder_target

最后使用 Model.fit 而不是 Model.fit_generator:

model.fit(x=ds, epochs=epochs)