从 Keras 中的生成器获取 x_test、y_test?
Getting x_test, y_test from generator in Keras?
对于某些问题,验证数据不能是生成器,例如:TensorBoard
histograms:
If printing histograms, validation_data must be provided, and cannot be a generator.
我当前的代码如下:
image_data_generator = ImageDataGenerator()
training_seq = image_data_generator.flow_from_directory(training_dir)
validation_seq = image_data_generator.flow_from_directory(validation_dir)
testing_seq = image_data_generator.flow_from_directory(testing_dir)
model = Sequential(..)
# ..
model.compile(..)
model.fit_generator(training_seq, validation_data=validation_seq, ..)
如何将其提供为 validation_data=(x_test, y_test)
?
Python 2.7 和 Python 3.* 解决方案:
from platform import python_version_tuple
if python_version_tuple()[0] == '3':
xrange = range
izip = zip
imap = map
else:
from itertools import izip, imap
import numpy as np
# ..
# other code as in question
# ..
x, y = izip(*(validation_seq[i] for i in xrange(len(validation_seq))))
x_val, y_val = np.vstack(x), np.vstack(y)
或支持class_mode='binary'
,则:
from keras.utils import to_categorical
x_val = np.vstack(x)
y_val = np.vstack(imap(to_categorical, y))[:,0] if class_mode == 'binary' else y
完整的可运行代码:https://gist.github.com/AlecTaylor/7f6cc03ed6c3dd84548a039e2e0fd006
更新(2018 年 6 月 22 日):阅读 OP 提供的答案以获得简洁高效的解决方案。阅读我的以了解发生了什么。
在 python 中,您可以使用以下方法获取所有发电机数据:
data = [x for x in generator]
但是,ImageDataGenerators
不会终止,因此上面的方法不起作用。但是我们可以使用相同的方法并进行一些修改以在这种情况下工作:
data = [] # store all the generated data batches
labels = [] # store all the generated label batches
max_iter = 100 # maximum number of iterations, in each iteration one batch is generated; the proper value depends on batch size and size of whole data
i = 0
for d, l in validation_generator:
data.append(d)
labels.append(l)
i += 1
if i == max_iter:
break
现在我们有两个张量批次列表。我们需要重塑它们以制作两个张量,一个用于数据(即 X
),一个用于标签(即 y
):
data = np.array(data)
data = np.reshape(data, (data.shape[0]*data.shape[1],) + data.shape[2:])
labels = np.array(labels)
labels = np.reshape(labels, (labels.shape[0]*labels.shape[1],) + labels.shape[2:])
对于某些问题,验证数据不能是生成器,例如:TensorBoard
histograms:
If printing histograms, validation_data must be provided, and cannot be a generator.
我当前的代码如下:
image_data_generator = ImageDataGenerator()
training_seq = image_data_generator.flow_from_directory(training_dir)
validation_seq = image_data_generator.flow_from_directory(validation_dir)
testing_seq = image_data_generator.flow_from_directory(testing_dir)
model = Sequential(..)
# ..
model.compile(..)
model.fit_generator(training_seq, validation_data=validation_seq, ..)
如何将其提供为 validation_data=(x_test, y_test)
?
Python 2.7 和 Python 3.* 解决方案:
from platform import python_version_tuple
if python_version_tuple()[0] == '3':
xrange = range
izip = zip
imap = map
else:
from itertools import izip, imap
import numpy as np
# ..
# other code as in question
# ..
x, y = izip(*(validation_seq[i] for i in xrange(len(validation_seq))))
x_val, y_val = np.vstack(x), np.vstack(y)
或支持class_mode='binary'
,则:
from keras.utils import to_categorical
x_val = np.vstack(x)
y_val = np.vstack(imap(to_categorical, y))[:,0] if class_mode == 'binary' else y
完整的可运行代码:https://gist.github.com/AlecTaylor/7f6cc03ed6c3dd84548a039e2e0fd006
更新(2018 年 6 月 22 日):阅读 OP 提供的答案以获得简洁高效的解决方案。阅读我的以了解发生了什么。
在 python 中,您可以使用以下方法获取所有发电机数据:
data = [x for x in generator]
但是,ImageDataGenerators
不会终止,因此上面的方法不起作用。但是我们可以使用相同的方法并进行一些修改以在这种情况下工作:
data = [] # store all the generated data batches
labels = [] # store all the generated label batches
max_iter = 100 # maximum number of iterations, in each iteration one batch is generated; the proper value depends on batch size and size of whole data
i = 0
for d, l in validation_generator:
data.append(d)
labels.append(l)
i += 1
if i == max_iter:
break
现在我们有两个张量批次列表。我们需要重塑它们以制作两个张量,一个用于数据(即 X
),一个用于标签(即 y
):
data = np.array(data)
data = np.reshape(data, (data.shape[0]*data.shape[1],) + data.shape[2:])
labels = np.array(labels)
labels = np.reshape(labels, (labels.shape[0]*labels.shape[1],) + labels.shape[2:])