Keras/Theano:训练期间节点编译失败
Keras/Theano: Node compilation failed during training
我正在尝试在 Mac OS X 上训练一个已经编译好的 Keras 模型,但出现以下错误:
Problem occurred during compilation with the command line below:
/usr/bin/clang++ -dynamiclib -g -O3 -fno-math-errno -Wno-unused-label -Wno-unused-variable -Wno-write-strings -march=haswell -DNPY_NO_DEPRECATED_API=NPY_1_7_API_VERSION -m64 -fPIC -undefined dynamic_lookup -I/usr/local/lib/python2.7/site-packages/numpy/core/include -I/usr/local/Cellar/python/2.7.13/Frameworks/Python.framework/Versions/2.7/include/python2.7 -I/usr/local/lib/python2.7/site-packages/theano/gof -L/usr/local/Cellar/python/2.7.13/Frameworks/Python.framework/Versions/2.7/lib -fvisibility=hidden -o /Users/valencra/.theano/compiledir_Darwin-16.4.0-x86_64-i386-64bit-i386-2.7.13-64/tmp9ahb_h/c6acccb2fd68eac67ca5b0f0fb9ad9bb.so /Users/valencra/.theano/compiledir_Darwin-16.4.0-x86_64-i386-64bit-i386-2.7.13-64/tmp9ahb_h/mod.cpp
/Users/valencra/.theano/compiledir_Darwin-16.4.0-x86_64-i386-64bit-i386-2.7.13-64/tmp9ahb_h/mod.cpp:894:21: warning: comparison of array 'outputs' equal to a null pointer is always false [-Wtautological-pointer-compare]
if (outputs == NULL) {
^~~~~~~ ~~~~
/Users/valencra/.theano/compiledir_Darwin-16.4.0-x86_64-i386-64bit-i386-2.7.13-64/tmp9ahb_h/mod.cpp:919:54: error: arithmetic on a pointer to void
PyArray_DATA(V3) + data_offset,
~~~~~~~~~~~~~~~~ ^
1 warning and 1 error generated.
Traceback (most recent call last):
File "osr.py", line 359, in <module>
osr.train_osr_model()
File "osr.py", line 88, in train_osr_model
nb_worker=1)
File "/usr/local/lib/python2.7/site-packages/keras/engine/training.py", line 1454, in fit_generator
self._make_train_function()
File "/usr/local/lib/python2.7/site-packages/keras/engine/training.py", line 767, in _make_train_function
**self._function_kwargs)
File "/usr/local/lib/python2.7/site-packages/keras/backend/theano_backend.py", line 969, in function
return Function(inputs, outputs, updates=updates, **kwargs)
File "/usr/local/lib/python2.7/site-packages/keras/backend/theano_backend.py", line 955, in __init__
**kwargs)
File "/usr/local/lib/python2.7/site-packages/theano/compile/function.py", line 326, in function
output_keys=output_keys)
File "/usr/local/lib/python2.7/site-packages/theano/compile/pfunc.py", line 486, in pfunc
output_keys=output_keys)
File "/usr/local/lib/python2.7/site-packages/theano/compile/function_module.py", line 1795, in orig_function
defaults)
File "/usr/local/lib/python2.7/site-packages/theano/compile/function_module.py", line 1661, in create
input_storage=input_storage_lists, storage_map=storage_map)
File "/usr/local/lib/python2.7/site-packages/theano/gof/link.py", line 699, in make_thunk
storage_map=storage_map)[:3]
File "/usr/local/lib/python2.7/site-packages/theano/gof/vm.py", line 1063, in make_all
impl=impl))
File "/usr/local/lib/python2.7/site-packages/theano/gof/op.py", line 924, in make_thunk
no_recycling)
File "/usr/local/lib/python2.7/site-packages/theano/gof/op.py", line 828, in make_c_thunk
output_storage=node_output_storage)
File "/usr/local/lib/python2.7/site-packages/theano/gof/cc.py", line 1190, in make_thunk
keep_lock=keep_lock)
File "/usr/local/lib/python2.7/site-packages/theano/gof/cc.py", line 1131, in __compile__
keep_lock=keep_lock)
File "/usr/local/lib/python2.7/site-packages/theano/gof/cc.py", line 1586, in cthunk_factory
key=key, lnk=self, keep_lock=keep_lock)
File "/usr/local/lib/python2.7/site-packages/theano/gof/cmodule.py", line 1155, in module_from_key
module = lnk.compile_cmodule(location)
File "/usr/local/lib/python2.7/site-packages/theano/gof/cc.py", line 1489, in compile_cmodule
preargs=preargs)
File "/usr/local/lib/python2.7/site-packages/theano/gof/cmodule.py", line 2304, in compile_str
(status, compile_stderr.replace('\n', '. ')))
Exception: ('The following error happened while compiling the node', Split{4}(InplaceDimShuffle{1,0,2}.0, TensorConstant{2}, TensorConstant{(4,) of 256}), '\n', "Compilation failed (return status=1): /Users/valencra/.theano/compiledir_Darwin-16.4.0-x86_64-i386-64bit-i386-2.7.13-64/tmp9ahb_h/mod.cpp:894:21: warning: comparison of array 'outputs' equal to a null pointer is always false [-Wtautological-pointer-compare]. if (outputs == NULL) {. ^~~~~~~ ~~~~. /Users/valencra/.theano/compiledir_Darwin-16.4.0-x86_64-i386-64bit-i386-2.7.13-64/tmp9ahb_h/mod.cpp:919:54: error: arithmetic on a pointer to void. PyArray_DATA(V3) + data_offset,. ~~~~~~~~~~~~~~~~ ^. 1 warning and 1 error generated.. ", '[*1 -> Split{4}(<TensorType(float32, 3D)>, TensorConstant{2}, TensorConstant{(4,) of 256}), *1::1, *1::2, *1::3]')
我更新了 Keras 和 Theano,但问题仍然存在。我很困惑,因为就在几天前训练完全相同的模型没有这个问题。以下是训练期间使用的函数:
def train_osr_model(self):
""" Train the optical speech recognizer
"""
print "\nTraining OSR"
validation_ratio = 0.3
batch_size = 32
with h5py.File(self.training_save_fn, "r") as training_save_file:
sample_count = int(training_save_file.attrs["sample_count"])
sample_idxs = range(0, sample_count)
sample_idxs = np.random.permutation(sample_idxs)
training_sample_idxs = sample_idxs[0:int((1-validation_ratio)*sample_count)]
validation_sample_idxs = sample_idxs[int((1-validation_ratio)*sample_count):]
training_sequence_generator = self.generate_training_sequences(batch_size=batch_size,
training_save_file=training_save_file,
training_sample_idxs=training_sample_idxs)
validation_sequence_generator = self.generate_validation_sequences(batch_size=batch_size,
training_save_file=training_save_file,
validation_sample_idxs=validation_sample_idxs)
print "Sample Idxs: {0}\n".format(sample_idxs) # FOR DEBUG ONLY
print "Training Idxs: {0}\n".format(training_sample_idxs) # FOR DEBUG ONLY
print "Validation Idxs: {0}\n".format(validation_sample_idxs) # FOR DEBUG ONLY
pbi = ProgressDisplay()
self.osr.fit_generator(generator=training_sequence_generator,
validation_data=validation_sequence_generator,
samples_per_epoch=len(training_sample_idxs),
nb_val_samples=len(validation_sample_idxs),
nb_epoch=10,
max_q_size=1,
verbose=2,
callbacks=[pbi],
class_weight=None,
nb_worker=1)
def generate_training_sequences(self, batch_size, training_save_file, training_sample_idxs):
""" Generates training sequences from HDF5 file on demand
"""
while True:
# generate sequences for training
training_sample_count = len(training_sample_idxs)
batches = int(training_sample_count/batch_size)
remainder_samples = training_sample_count%batch_size
if remainder_samples:
batches = batches + 1
# generate batches of samples
for idx in xrange(0, batches):
if idx == batches - 1:
batch_idxs = training_sample_idxs[idx*batch_size:]
else:
batch_idxs = training_sample_idxs[idx*batch_size:idx*batch_size+batch_size]
print batch_idxs # FOR DEBUG ONLY
X = training_save_file["X"][batch_idxs]
Y = training_save_file["Y"][batch_idxs]
yield (np.array(X), np.array(Y))
def generate_validation_sequences(self, batch_size, training_save_file, validation_sample_idxs):
while True:
# generate sequences for validation
validation_sample_count = len(validation_sample_idxs)
batches = int(validation_sample_count/batch_size)
remainder_samples = validation_sample_count%batch_size
if remainder_samples:
batches = batches + 1
# generate batches of samples
for idx in xrange(0, batches):
if idx == batches - 1:
batch_idxs = validation_sample_idxs[idx*batch_size:]
else:
batch_idxs = validation_sample_idxs[idx*batch_size:idx*batch_size+batch_size]
print batch_idxs # FOR DEBUG ONLY
X = training_save_file["X"][batch_idxs]
Y = training_save_file["Y"][batch_idxs]
yield (np.array(X), np.array(Y))
供参考,这是正在训练的模型:
def generate_osr_model(self):
""" Builds the optical speech recognizer model
"""
print "".join(["\nGenerating OSR model\n",
"-"*40])
with h5py.File(self.training_save_fn, "r") as training_save_file:
class_count = len(training_save_file.attrs["training_classes"].split(","))
video = Input(shape=(self.frames_per_sequence,
3,
self.rows,
self.columns))
cnn_base = VGG16(input_shape=(3,
self.rows,
self.columns),
weights="imagenet",
include_top=False)
cnn_out = GlobalAveragePooling2D()(cnn_base.output)
cnn = Model(input=cnn_base.input, output=cnn_out)
cnn.trainable = False
encoded_frames = TimeDistributed(cnn)(video)
encoded_vid = LSTM(256)(encoded_frames)
hidden_layer = Dense(output_dim=1024, activation="relu")(encoded_vid)
outputs = Dense(output_dim=class_count, activation="softmax")(hidden_layer)
osr = Model([video], outputs)
optimizer = Nadam(lr=0.002,
beta_1=0.9,
beta_2=0.999,
epsilon=1e-08,
schedule_decay=0.004)
osr.compile(loss="categorical_crossentropy",
optimizer=optimizer,
metrics=["categorical_accuracy"])
self.osr = osr
print " * OSR MODEL GENERATED * "
模型摘要:
Generating OSR model
----------------------------------------
* OSR MODEL GENERATED *
*** MODEL SUMMARY ***
____________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
====================================================================================================
input_1 (InputLayer) (None, 30, 3, 100, 15 0
____________________________________________________________________________________________________
timedistributed_1 (TimeDistribut (None, 30, 512) 14714688 input_1[0][0]
____________________________________________________________________________________________________
lstm_1 (LSTM) (None, 256) 787456 timedistributed_1[0][0]
____________________________________________________________________________________________________
dense_1 (Dense) (None, 1024) 263168 lstm_1[0][0]
____________________________________________________________________________________________________
dense_2 (Dense) (None, 3) 3075 dense_1[0][0]
====================================================================================================
Total params: 15,768,387
Trainable params: 1,053,699
Non-trainable params: 14,714,688
问题似乎源于从 github 存储库安装 Theano 和 Keras,如下所示:
pip install git+git://github.com/Theano/Theano.git
pip install git+git://github.com/fchollet/keras.git
我通过卸载 Theano 和 Keras 来修复它,然后使用 pip 直接安装它们:
pip uninstall Theano
pip uninstall keras
pip install Theano
pip install keras
Theano 或 Keras 的前沿版本可能存在问题。希望这对其他人也有帮助。
编辑:看来这个问题确实来自 Theano 的 master 分支。按照我在 Theano 的存储库上发布的问题进行潜在的永久修复 https://github.com/Theano/Theano/issues/5655
我正在尝试在 Mac OS X 上训练一个已经编译好的 Keras 模型,但出现以下错误:
Problem occurred during compilation with the command line below:
/usr/bin/clang++ -dynamiclib -g -O3 -fno-math-errno -Wno-unused-label -Wno-unused-variable -Wno-write-strings -march=haswell -DNPY_NO_DEPRECATED_API=NPY_1_7_API_VERSION -m64 -fPIC -undefined dynamic_lookup -I/usr/local/lib/python2.7/site-packages/numpy/core/include -I/usr/local/Cellar/python/2.7.13/Frameworks/Python.framework/Versions/2.7/include/python2.7 -I/usr/local/lib/python2.7/site-packages/theano/gof -L/usr/local/Cellar/python/2.7.13/Frameworks/Python.framework/Versions/2.7/lib -fvisibility=hidden -o /Users/valencra/.theano/compiledir_Darwin-16.4.0-x86_64-i386-64bit-i386-2.7.13-64/tmp9ahb_h/c6acccb2fd68eac67ca5b0f0fb9ad9bb.so /Users/valencra/.theano/compiledir_Darwin-16.4.0-x86_64-i386-64bit-i386-2.7.13-64/tmp9ahb_h/mod.cpp
/Users/valencra/.theano/compiledir_Darwin-16.4.0-x86_64-i386-64bit-i386-2.7.13-64/tmp9ahb_h/mod.cpp:894:21: warning: comparison of array 'outputs' equal to a null pointer is always false [-Wtautological-pointer-compare]
if (outputs == NULL) {
^~~~~~~ ~~~~
/Users/valencra/.theano/compiledir_Darwin-16.4.0-x86_64-i386-64bit-i386-2.7.13-64/tmp9ahb_h/mod.cpp:919:54: error: arithmetic on a pointer to void
PyArray_DATA(V3) + data_offset,
~~~~~~~~~~~~~~~~ ^
1 warning and 1 error generated.
Traceback (most recent call last):
File "osr.py", line 359, in <module>
osr.train_osr_model()
File "osr.py", line 88, in train_osr_model
nb_worker=1)
File "/usr/local/lib/python2.7/site-packages/keras/engine/training.py", line 1454, in fit_generator
self._make_train_function()
File "/usr/local/lib/python2.7/site-packages/keras/engine/training.py", line 767, in _make_train_function
**self._function_kwargs)
File "/usr/local/lib/python2.7/site-packages/keras/backend/theano_backend.py", line 969, in function
return Function(inputs, outputs, updates=updates, **kwargs)
File "/usr/local/lib/python2.7/site-packages/keras/backend/theano_backend.py", line 955, in __init__
**kwargs)
File "/usr/local/lib/python2.7/site-packages/theano/compile/function.py", line 326, in function
output_keys=output_keys)
File "/usr/local/lib/python2.7/site-packages/theano/compile/pfunc.py", line 486, in pfunc
output_keys=output_keys)
File "/usr/local/lib/python2.7/site-packages/theano/compile/function_module.py", line 1795, in orig_function
defaults)
File "/usr/local/lib/python2.7/site-packages/theano/compile/function_module.py", line 1661, in create
input_storage=input_storage_lists, storage_map=storage_map)
File "/usr/local/lib/python2.7/site-packages/theano/gof/link.py", line 699, in make_thunk
storage_map=storage_map)[:3]
File "/usr/local/lib/python2.7/site-packages/theano/gof/vm.py", line 1063, in make_all
impl=impl))
File "/usr/local/lib/python2.7/site-packages/theano/gof/op.py", line 924, in make_thunk
no_recycling)
File "/usr/local/lib/python2.7/site-packages/theano/gof/op.py", line 828, in make_c_thunk
output_storage=node_output_storage)
File "/usr/local/lib/python2.7/site-packages/theano/gof/cc.py", line 1190, in make_thunk
keep_lock=keep_lock)
File "/usr/local/lib/python2.7/site-packages/theano/gof/cc.py", line 1131, in __compile__
keep_lock=keep_lock)
File "/usr/local/lib/python2.7/site-packages/theano/gof/cc.py", line 1586, in cthunk_factory
key=key, lnk=self, keep_lock=keep_lock)
File "/usr/local/lib/python2.7/site-packages/theano/gof/cmodule.py", line 1155, in module_from_key
module = lnk.compile_cmodule(location)
File "/usr/local/lib/python2.7/site-packages/theano/gof/cc.py", line 1489, in compile_cmodule
preargs=preargs)
File "/usr/local/lib/python2.7/site-packages/theano/gof/cmodule.py", line 2304, in compile_str
(status, compile_stderr.replace('\n', '. ')))
Exception: ('The following error happened while compiling the node', Split{4}(InplaceDimShuffle{1,0,2}.0, TensorConstant{2}, TensorConstant{(4,) of 256}), '\n', "Compilation failed (return status=1): /Users/valencra/.theano/compiledir_Darwin-16.4.0-x86_64-i386-64bit-i386-2.7.13-64/tmp9ahb_h/mod.cpp:894:21: warning: comparison of array 'outputs' equal to a null pointer is always false [-Wtautological-pointer-compare]. if (outputs == NULL) {. ^~~~~~~ ~~~~. /Users/valencra/.theano/compiledir_Darwin-16.4.0-x86_64-i386-64bit-i386-2.7.13-64/tmp9ahb_h/mod.cpp:919:54: error: arithmetic on a pointer to void. PyArray_DATA(V3) + data_offset,. ~~~~~~~~~~~~~~~~ ^. 1 warning and 1 error generated.. ", '[*1 -> Split{4}(<TensorType(float32, 3D)>, TensorConstant{2}, TensorConstant{(4,) of 256}), *1::1, *1::2, *1::3]')
我更新了 Keras 和 Theano,但问题仍然存在。我很困惑,因为就在几天前训练完全相同的模型没有这个问题。以下是训练期间使用的函数:
def train_osr_model(self):
""" Train the optical speech recognizer
"""
print "\nTraining OSR"
validation_ratio = 0.3
batch_size = 32
with h5py.File(self.training_save_fn, "r") as training_save_file:
sample_count = int(training_save_file.attrs["sample_count"])
sample_idxs = range(0, sample_count)
sample_idxs = np.random.permutation(sample_idxs)
training_sample_idxs = sample_idxs[0:int((1-validation_ratio)*sample_count)]
validation_sample_idxs = sample_idxs[int((1-validation_ratio)*sample_count):]
training_sequence_generator = self.generate_training_sequences(batch_size=batch_size,
training_save_file=training_save_file,
training_sample_idxs=training_sample_idxs)
validation_sequence_generator = self.generate_validation_sequences(batch_size=batch_size,
training_save_file=training_save_file,
validation_sample_idxs=validation_sample_idxs)
print "Sample Idxs: {0}\n".format(sample_idxs) # FOR DEBUG ONLY
print "Training Idxs: {0}\n".format(training_sample_idxs) # FOR DEBUG ONLY
print "Validation Idxs: {0}\n".format(validation_sample_idxs) # FOR DEBUG ONLY
pbi = ProgressDisplay()
self.osr.fit_generator(generator=training_sequence_generator,
validation_data=validation_sequence_generator,
samples_per_epoch=len(training_sample_idxs),
nb_val_samples=len(validation_sample_idxs),
nb_epoch=10,
max_q_size=1,
verbose=2,
callbacks=[pbi],
class_weight=None,
nb_worker=1)
def generate_training_sequences(self, batch_size, training_save_file, training_sample_idxs):
""" Generates training sequences from HDF5 file on demand
"""
while True:
# generate sequences for training
training_sample_count = len(training_sample_idxs)
batches = int(training_sample_count/batch_size)
remainder_samples = training_sample_count%batch_size
if remainder_samples:
batches = batches + 1
# generate batches of samples
for idx in xrange(0, batches):
if idx == batches - 1:
batch_idxs = training_sample_idxs[idx*batch_size:]
else:
batch_idxs = training_sample_idxs[idx*batch_size:idx*batch_size+batch_size]
print batch_idxs # FOR DEBUG ONLY
X = training_save_file["X"][batch_idxs]
Y = training_save_file["Y"][batch_idxs]
yield (np.array(X), np.array(Y))
def generate_validation_sequences(self, batch_size, training_save_file, validation_sample_idxs):
while True:
# generate sequences for validation
validation_sample_count = len(validation_sample_idxs)
batches = int(validation_sample_count/batch_size)
remainder_samples = validation_sample_count%batch_size
if remainder_samples:
batches = batches + 1
# generate batches of samples
for idx in xrange(0, batches):
if idx == batches - 1:
batch_idxs = validation_sample_idxs[idx*batch_size:]
else:
batch_idxs = validation_sample_idxs[idx*batch_size:idx*batch_size+batch_size]
print batch_idxs # FOR DEBUG ONLY
X = training_save_file["X"][batch_idxs]
Y = training_save_file["Y"][batch_idxs]
yield (np.array(X), np.array(Y))
供参考,这是正在训练的模型:
def generate_osr_model(self):
""" Builds the optical speech recognizer model
"""
print "".join(["\nGenerating OSR model\n",
"-"*40])
with h5py.File(self.training_save_fn, "r") as training_save_file:
class_count = len(training_save_file.attrs["training_classes"].split(","))
video = Input(shape=(self.frames_per_sequence,
3,
self.rows,
self.columns))
cnn_base = VGG16(input_shape=(3,
self.rows,
self.columns),
weights="imagenet",
include_top=False)
cnn_out = GlobalAveragePooling2D()(cnn_base.output)
cnn = Model(input=cnn_base.input, output=cnn_out)
cnn.trainable = False
encoded_frames = TimeDistributed(cnn)(video)
encoded_vid = LSTM(256)(encoded_frames)
hidden_layer = Dense(output_dim=1024, activation="relu")(encoded_vid)
outputs = Dense(output_dim=class_count, activation="softmax")(hidden_layer)
osr = Model([video], outputs)
optimizer = Nadam(lr=0.002,
beta_1=0.9,
beta_2=0.999,
epsilon=1e-08,
schedule_decay=0.004)
osr.compile(loss="categorical_crossentropy",
optimizer=optimizer,
metrics=["categorical_accuracy"])
self.osr = osr
print " * OSR MODEL GENERATED * "
模型摘要:
Generating OSR model
----------------------------------------
* OSR MODEL GENERATED *
*** MODEL SUMMARY ***
____________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
====================================================================================================
input_1 (InputLayer) (None, 30, 3, 100, 15 0
____________________________________________________________________________________________________
timedistributed_1 (TimeDistribut (None, 30, 512) 14714688 input_1[0][0]
____________________________________________________________________________________________________
lstm_1 (LSTM) (None, 256) 787456 timedistributed_1[0][0]
____________________________________________________________________________________________________
dense_1 (Dense) (None, 1024) 263168 lstm_1[0][0]
____________________________________________________________________________________________________
dense_2 (Dense) (None, 3) 3075 dense_1[0][0]
====================================================================================================
Total params: 15,768,387
Trainable params: 1,053,699
Non-trainable params: 14,714,688
问题似乎源于从 github 存储库安装 Theano 和 Keras,如下所示:
pip install git+git://github.com/Theano/Theano.git
pip install git+git://github.com/fchollet/keras.git
我通过卸载 Theano 和 Keras 来修复它,然后使用 pip 直接安装它们:
pip uninstall Theano
pip uninstall keras
pip install Theano
pip install keras
Theano 或 Keras 的前沿版本可能存在问题。希望这对其他人也有帮助。
编辑:看来这个问题确实来自 Theano 的 master 分支。按照我在 Theano 的存储库上发布的问题进行潜在的永久修复 https://github.com/Theano/Theano/issues/5655