ValueError: None values not supported. Code working properly on CPU/GPU but not on TPU
ValueError: None values not supported. Code working properly on CPU/GPU but not on TPU
我正在尝试训练一个 seq2seq
语言翻译模型,我正在从这个 Kaggle Notebook on Google Colab. The code is working fine with CPU and GPU, but it is giving me errors while training on a TPU. This same question has been already asked here.
中复制粘贴代码
这是我的代码:
strategy = tf.distribute.experimental.TPUStrategy(resolver)
with strategy.scope():
model = create_model()
model.compile(optimizer = 'rmsprop', loss = 'categorical_crossentropy')
model.fit_generator(generator = generate_batch(X_train, y_train, batch_size = batch_size),
steps_per_epoch = train_samples // batch_size,
epochs = epochs,
validation_data = generate_batch(X_test, y_test, batch_size = batch_size),
validation_steps = val_samples // batch_size)
回溯:
Epoch 1/2
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-60-940fe0ee3c8b> in <module>()
3 epochs = epochs,
4 validation_data = generate_batch(X_test, y_test, batch_size = batch_size),
----> 5 validation_steps = val_samples // batch_size)
10 frames
/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/func_graph.py in wrapper(*args, **kwargs)
992 except Exception as e: # pylint:disable=broad-except
993 if hasattr(e, "ag_error_metadata"):
--> 994 raise e.ag_error_metadata.to_exception(e)
995 else:
996 raise
ValueError: in user code:
/usr/local/lib/python3.7/dist-packages/keras/engine/training.py:853 train_function *
return step_function(self, iterator)
/usr/local/lib/python3.7/dist-packages/keras/engine/training.py:842 step_function **
outputs = model.distribute_strategy.run(run_step, args=(data,))
...
ValueError: None values not supported.
我无法找出错误,我认为错误是因为这个 generate_batch
函数:
X, y = lines['english_sentence'], lines['hindi_sentence']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 34)
def generate_batch(X = X_train, y = y_train, batch_size = 128):
while True:
for j in range(0, len(X), batch_size):
encoder_input_data = np.zeros((batch_size, max_length_src), dtype='float32')
decoder_input_data = np.zeros((batch_size, max_length_tar), dtype='float32')
decoder_target_data = np.zeros((batch_size, max_length_tar, num_decoder_tokens), dtype='float32')
for i, (input_text, target_text) in enumerate(zip(X[j:j + batch_size], y[j:j + batch_size])):
for t, word in enumerate(input_text.split()):
encoder_input_data[i, t] = input_token_index[word]
for t, word in enumerate(target_text.split()):
if t<len(target_text.split())-1:
decoder_input_data[i, t] = target_token_index[word]
if t>0:
decoder_target_data[i, t - 1, target_token_index[word]] = 1.
yield([encoder_input_data, decoder_input_data], decoder_target_data)
我的 Colab 笔记本 - here
Kaggle 数据集 - here
TensorFlow 版本 - 2.6
编辑 - 请不要告诉我将 TensorFlow/Keras 版本降级到 1.x
。我可以将其降级为 TensorFlow 2.0, 2.1, 2.3
,但不能降级为 1.x
。我不明白TensorFlow 1.x
。此外,使用 3 年前的版本没有意义。
需要降级到 Keras 1.0.2
如果有效那就太好了,否则我会告诉其他解决方案。
您需要更新 Keras,您的问题将得到解决
如您提供的 link 中的参考答案所述,tensorflow.data
API 与 TPU 配合使用效果更好。为了适应您的情况,请尝试在 generate_batch
函数中使用 return
而不是 yield
:
def generate_batch(X = X_train, y = y_train, batch_size = 128):
...
return encoder_input_data, decoder_input_data, decoder_target_dat
encoder_input_data, decoder_input_data, decoder_target_data = generate_batch(X_train, y_train, batch_size=128)
然后使用 tensorflow.data
构建您的数据:
from tensorflow.data import Dataset
encoder_input_data = Dataset.from_tensor_slices(encoder_input_data)
decoder_input_data = Dataset.from_tensor_slices(decoder_input_data)
decoder_target_data = Dataset.from_tensor_slices(decoder_target_data)
ds = Dataset.zip((encoder_input_data, decoder_input_data, decoder_target_data)).map(map_fn).batch(1024)
其中 map_fn
定义为:
def map_fn(encoder_input ,decoder_input, decoder_target):
return (encoder_input ,decoder_input), decoder_target
最后使用 Model.fit
而不是 Model.fit_generator
:
model.fit(x=ds, epochs=epochs)
我正在尝试训练一个 seq2seq
语言翻译模型,我正在从这个 Kaggle Notebook on Google Colab. The code is working fine with CPU and GPU, but it is giving me errors while training on a TPU. This same question has been already asked here.
这是我的代码:
strategy = tf.distribute.experimental.TPUStrategy(resolver)
with strategy.scope():
model = create_model()
model.compile(optimizer = 'rmsprop', loss = 'categorical_crossentropy')
model.fit_generator(generator = generate_batch(X_train, y_train, batch_size = batch_size),
steps_per_epoch = train_samples // batch_size,
epochs = epochs,
validation_data = generate_batch(X_test, y_test, batch_size = batch_size),
validation_steps = val_samples // batch_size)
回溯:
Epoch 1/2
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-60-940fe0ee3c8b> in <module>()
3 epochs = epochs,
4 validation_data = generate_batch(X_test, y_test, batch_size = batch_size),
----> 5 validation_steps = val_samples // batch_size)
10 frames
/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/func_graph.py in wrapper(*args, **kwargs)
992 except Exception as e: # pylint:disable=broad-except
993 if hasattr(e, "ag_error_metadata"):
--> 994 raise e.ag_error_metadata.to_exception(e)
995 else:
996 raise
ValueError: in user code:
/usr/local/lib/python3.7/dist-packages/keras/engine/training.py:853 train_function *
return step_function(self, iterator)
/usr/local/lib/python3.7/dist-packages/keras/engine/training.py:842 step_function **
outputs = model.distribute_strategy.run(run_step, args=(data,))
...
ValueError: None values not supported.
我无法找出错误,我认为错误是因为这个 generate_batch
函数:
X, y = lines['english_sentence'], lines['hindi_sentence']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 34)
def generate_batch(X = X_train, y = y_train, batch_size = 128):
while True:
for j in range(0, len(X), batch_size):
encoder_input_data = np.zeros((batch_size, max_length_src), dtype='float32')
decoder_input_data = np.zeros((batch_size, max_length_tar), dtype='float32')
decoder_target_data = np.zeros((batch_size, max_length_tar, num_decoder_tokens), dtype='float32')
for i, (input_text, target_text) in enumerate(zip(X[j:j + batch_size], y[j:j + batch_size])):
for t, word in enumerate(input_text.split()):
encoder_input_data[i, t] = input_token_index[word]
for t, word in enumerate(target_text.split()):
if t<len(target_text.split())-1:
decoder_input_data[i, t] = target_token_index[word]
if t>0:
decoder_target_data[i, t - 1, target_token_index[word]] = 1.
yield([encoder_input_data, decoder_input_data], decoder_target_data)
我的 Colab 笔记本 - here
Kaggle 数据集 - here
TensorFlow 版本 - 2.6
编辑 - 请不要告诉我将 TensorFlow/Keras 版本降级到 1.x
。我可以将其降级为 TensorFlow 2.0, 2.1, 2.3
,但不能降级为 1.x
。我不明白TensorFlow 1.x
。此外,使用 3 年前的版本没有意义。
需要降级到 Keras 1.0.2 如果有效那就太好了,否则我会告诉其他解决方案。
您需要更新 Keras,您的问题将得到解决
如您提供的 link 中的参考答案所述,tensorflow.data
API 与 TPU 配合使用效果更好。为了适应您的情况,请尝试在 generate_batch
函数中使用 return
而不是 yield
:
def generate_batch(X = X_train, y = y_train, batch_size = 128):
...
return encoder_input_data, decoder_input_data, decoder_target_dat
encoder_input_data, decoder_input_data, decoder_target_data = generate_batch(X_train, y_train, batch_size=128)
然后使用 tensorflow.data
构建您的数据:
from tensorflow.data import Dataset
encoder_input_data = Dataset.from_tensor_slices(encoder_input_data)
decoder_input_data = Dataset.from_tensor_slices(decoder_input_data)
decoder_target_data = Dataset.from_tensor_slices(decoder_target_data)
ds = Dataset.zip((encoder_input_data, decoder_input_data, decoder_target_data)).map(map_fn).batch(1024)
其中 map_fn
定义为:
def map_fn(encoder_input ,decoder_input, decoder_target):
return (encoder_input ,decoder_input), decoder_target
最后使用 Model.fit
而不是 Model.fit_generator
:
model.fit(x=ds, epochs=epochs)