如何使用 .predict_on_batch 从 Tensorflow 数据集中预测多个批次？

Question

正如问题所说，我只能使用 model.predict_on_batch() 从我的模型中进行预测。如果我使用 model.predict()，Keras 会尝试将所有内容连接在一起，但这不起作用。
对于我的应用程序（序列到序列模型），动态分组更快。但是，即使我在 Pandas 中完成了它，然后只将数据集用于填充批次，.predict() 仍然不应该工作吗？

如果我能让 predict_on_batch 工作，那就行了。但我只能预测第一批数据集。我如何获得其余部分的预测？我无法遍历数据集，无法使用它...

这是一个较小的代码示例。组与标签相同，但在现实世界中它们显然是两种不同的东西。有3个类，一个序列最多2个值，每批2行数据。有很多评论，我从 Whosebug 的某个地方截取了部分窗口。我希望它对大多数人来说相当清晰。

如果您对如何改进代码有任何其他建议，请发表评论。但是不，那根本不是我的模型的样子。因此，对该部分的建议可能没有帮助。

编辑：Tensorflow 版本 2.1.0

import tensorflow as tf
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Bidirectional, Masking, Input, Dense, GRU
import random
import numpy as np

random.seed(100)
# input data
feature = list(range(3, 14))
# shuffle data
random.shuffle(feature)
# make label from feature data, +1 because we are padding with zero
label = [feat // 5 +1 for feat in feature]
group = label[:]
# random.shuffle(group)
max_group = 2
batch_size = 2

print('Data:')
print(*zip(group, feature, label), sep='\n')

# make dataset from data arrays
ds = tf.data.Dataset.zip((tf.data.Dataset.from_tensor_slices({'group': group, 'feature': feature}), 
                          tf.data.Dataset.from_tensor_slices({'label': label})))
# group by window
ds = ds.apply(tf.data.experimental.group_by_window(
    # use feature as key (you may have to use tf.reshape(x['group'], []) instead of tf.cast)
    key_func=lambda x, y: tf.cast(x['group'], tf.int64),
    # convert each window to a batch
    reduce_func=lambda _, window: window.batch(max_group),
    # use batch size as window size
    window_size=max_group))

# shuffle at most 100k rows, but commented out because we don't want to predict on shuffled data
# ds = ds.shuffle(int(1e5)) 
ds = ds.padded_batch(batch_size,
                     padded_shapes=({s: (None,) for s in ['group', 'feature']}, 
                                    {s: (None,) for s in ['label']}))
# show dataset contents
print('Result:')
for element in ds:
    print(element)

# Keras matches the name in the input to the tensor names in the first part of ds
inp = Input(shape=(None,), name='feature')
# RNNs require an additional rank, even if it is a degenerate dimension
duck = tf.expand_dims(inp, axis=-1)
rnn = GRU(32, return_sequences=True)(duck)
# again Keras matches names
out = Dense(max(label)+1, activation='softmax', name='label')(rnn)
model = Model(inputs=inp, outputs=out)
model.summary()
model.compile(loss='sparse_categorical_crossentropy', metrics=['accuracy'])

model.fit(ds, epochs=3)

model.predict_on_batch(ds)

Answer 1

您可以像这样遍历数据集，记住什么是 "x" 什么是 "y" 典型的表示法：

for item in ds:
    xi, yi = item
    pi = model.predict_on_batch(xi)
    print(xi["group"].shape, pi.shape)

当然，这是对每个元素的单独预测。否则你必须自己定义批次，通过将匹配的形状批处理在一起，因为批次大小本身是可变的。

如何使用 .predict_on_batch 从 Tensorflow 数据集中预测多个批次？

How do I predict on more than one batch from a Tensorflow Dataset, using .predict_on_batch?

python

tensorflow

keras

tensorflow-datasets

seq2seq