Talos.Scan() 在完成排列之前停止短暂而没有错误
Talos.Scan() stops short without error before completing permutations
我尝试了多种调试选项,但我无法让 talos 在它停止之前执行多个排列,而且没有任何关于问题的提示。这个场景看起来很简单,那我做错了什么?
输入数据可用here。
以下是我的模型函数,参数space和talos.Scan()调用。完整代码可用 here.
# Create, compile and fit network
# This is rewritten for talos hyperparamter optimization
# Removed kernel_initializer='normal' from dense layers from example. Default is glorot_uniform
def createNetworkAndFit(trainVectors, trainLabels, validationVectors, validationLabels, params):
# Create model
model = Sequential()
model.add(Dense(params['first_neuron'], input_dim=trainVectors.shape[1], activation=params['activation']))
model.add(Dropout(params['dropout']))
talos.model.layers.hidden_layers(model, params, 1)
model.add(Dense(1, activation=params['last_activation']))
# Compile model
model.compile(loss=params['losses'], optimizer=params['optimizer'](), metrics=['accuracy', fmeasure_acc, 'mean_squared_error'])
# Fit model
history = model.fit(trainVectors, trainLabels, validation_data=[validationVectors, validationLabels], batch_size=params['batch_size'], epochs=params['epochs'], verbose=0)
return history, model
# Define hyperparameter space
# As hidden layers are generated, "last_neuron" is the number of hidden units.
# Does this mean all hidden layers have the same number of hidden units?
p = {'first_neuron': [trainVectors.shape[1]],
'dropout': [0, 0.25, 0.5],
'hidden_layers': [2, 3],
'shapes': ['brick', 'funnel'],
'batch_size': [trainVectors.shape[0], int(trainVectors.shape[0]/10), int(trainVectors.shape[0]/100), int(trainVectors.shape[0]/1000)],
'epochs': [300],
'optimizer': [Nadam, Adam, RMSprop],
'losses': [binary_crossentropy],
'activation': [relu, elu],
'last_activation': ['sigmoid']}
# Hyperparamter Search
experiment = talos.Scan(x=trainVectors,
y=trainLabels,
model=createNetworkAndFit,
grid_downsample=0.01,
params=p,
dataset_name='15000_talos',
experiment_no='1',
print_params=True,
disable_progress_bar=True,
clear_tf_session=True,
debug=True)
这是我的输出:
Using TensorFlow backend.
{'batch_size': 312, 'hidden_layers': 3, 'activation': <function relu at 0x7f77e75e9510>, 'epochs': 300, 'optimizer': <class 'keras.optimizers.Nadam'>, 'shapes': 'brick', 'last_activation': 'sigmoid', 'losses': <function binary_crossentropy at 0x7f777dee6ae8>, 'first_neuron': 52, 'dropout': 0.25}
2019-06-02 10:46:45.248187: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-06-02 10:46:45.293153: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:964] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-06-02 10:46:45.293569: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties:
name: GeForce GTX 780 major: 3 minor: 5 memoryClockRate(GHz): 0.941
pciBusID: 0000:01:00.0
totalMemory: 2.95GiB freeMemory: 2.84GiB
2019-06-02 10:46:45.293595: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2019-06-02 10:46:45.478345: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-06-02 10:46:45.478378: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0
2019-06-02 10:46:45.478395: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N
2019-06-02 10:46:45.478491: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 2560 MB memory) -> physical GPU (device: 0, name: GeForce GTX 780, pci bus id: 0000:01:00.0, compute capability: 3.5)
{'batch_size': 3120, 'hidden_layers': 3, 'activation': <function elu at 0x7f77e75e92f0>, 'epochs': 300, 'optimizer': <class 'keras.optimizers.RMSprop'>, 'shapes': 'brick', 'last_activation': 'sigmoid', 'losses': <function binary_crossentropy at 0x7f777dee6ae8>, 'first_neuron': 52, 'dropout': 0.5}
2019-06-02 10:46:56.373641: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2019-06-02 10:46:56.373692: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-06-02 10:46:56.373707: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0
2019-06-02 10:46:56.373712: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N
2019-06-02 10:46:56.373799: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 2560 MB memory) -> physical GPU (device: 0, name: GeForce GTX 780, pci bus id: 0000:01:00.0, compute capability: 3.5)
编辑1
我注意到 p 中的一些参数没有在模型函数中使用。更改后,搜索仍然会停止。我已经编辑了上面的代码。
问题是我选择了 grid_downsample (0.01),这对于网格中 space 可能的排列来说太小了。如果 Talos 在与随机下采样相关的网格大小方面提供更多反馈,那就太好了。这是我结束的 Scan() 调用:
# Hyperparamter Search
experiment = talos.Scan(x=trainVectors,
y=trainLabels,
model=createNetworkAndFit,
grid_downsample=1,
params=p,
dataset_name='15000_talos',
experiment_no='1',
print_params=True,
disable_progress_bar=True,
clear_tf_session=True,
debug=True)
我尝试了多种调试选项,但我无法让 talos 在它停止之前执行多个排列,而且没有任何关于问题的提示。这个场景看起来很简单,那我做错了什么?
输入数据可用here。
以下是我的模型函数,参数space和talos.Scan()调用。完整代码可用 here.
# Create, compile and fit network
# This is rewritten for talos hyperparamter optimization
# Removed kernel_initializer='normal' from dense layers from example. Default is glorot_uniform
def createNetworkAndFit(trainVectors, trainLabels, validationVectors, validationLabels, params):
# Create model
model = Sequential()
model.add(Dense(params['first_neuron'], input_dim=trainVectors.shape[1], activation=params['activation']))
model.add(Dropout(params['dropout']))
talos.model.layers.hidden_layers(model, params, 1)
model.add(Dense(1, activation=params['last_activation']))
# Compile model
model.compile(loss=params['losses'], optimizer=params['optimizer'](), metrics=['accuracy', fmeasure_acc, 'mean_squared_error'])
# Fit model
history = model.fit(trainVectors, trainLabels, validation_data=[validationVectors, validationLabels], batch_size=params['batch_size'], epochs=params['epochs'], verbose=0)
return history, model
# Define hyperparameter space
# As hidden layers are generated, "last_neuron" is the number of hidden units.
# Does this mean all hidden layers have the same number of hidden units?
p = {'first_neuron': [trainVectors.shape[1]],
'dropout': [0, 0.25, 0.5],
'hidden_layers': [2, 3],
'shapes': ['brick', 'funnel'],
'batch_size': [trainVectors.shape[0], int(trainVectors.shape[0]/10), int(trainVectors.shape[0]/100), int(trainVectors.shape[0]/1000)],
'epochs': [300],
'optimizer': [Nadam, Adam, RMSprop],
'losses': [binary_crossentropy],
'activation': [relu, elu],
'last_activation': ['sigmoid']}
# Hyperparamter Search
experiment = talos.Scan(x=trainVectors,
y=trainLabels,
model=createNetworkAndFit,
grid_downsample=0.01,
params=p,
dataset_name='15000_talos',
experiment_no='1',
print_params=True,
disable_progress_bar=True,
clear_tf_session=True,
debug=True)
这是我的输出:
Using TensorFlow backend.
{'batch_size': 312, 'hidden_layers': 3, 'activation': <function relu at 0x7f77e75e9510>, 'epochs': 300, 'optimizer': <class 'keras.optimizers.Nadam'>, 'shapes': 'brick', 'last_activation': 'sigmoid', 'losses': <function binary_crossentropy at 0x7f777dee6ae8>, 'first_neuron': 52, 'dropout': 0.25}
2019-06-02 10:46:45.248187: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-06-02 10:46:45.293153: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:964] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-06-02 10:46:45.293569: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties:
name: GeForce GTX 780 major: 3 minor: 5 memoryClockRate(GHz): 0.941
pciBusID: 0000:01:00.0
totalMemory: 2.95GiB freeMemory: 2.84GiB
2019-06-02 10:46:45.293595: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2019-06-02 10:46:45.478345: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-06-02 10:46:45.478378: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0
2019-06-02 10:46:45.478395: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N
2019-06-02 10:46:45.478491: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 2560 MB memory) -> physical GPU (device: 0, name: GeForce GTX 780, pci bus id: 0000:01:00.0, compute capability: 3.5)
{'batch_size': 3120, 'hidden_layers': 3, 'activation': <function elu at 0x7f77e75e92f0>, 'epochs': 300, 'optimizer': <class 'keras.optimizers.RMSprop'>, 'shapes': 'brick', 'last_activation': 'sigmoid', 'losses': <function binary_crossentropy at 0x7f777dee6ae8>, 'first_neuron': 52, 'dropout': 0.5}
2019-06-02 10:46:56.373641: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2019-06-02 10:46:56.373692: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-06-02 10:46:56.373707: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0
2019-06-02 10:46:56.373712: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N
2019-06-02 10:46:56.373799: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 2560 MB memory) -> physical GPU (device: 0, name: GeForce GTX 780, pci bus id: 0000:01:00.0, compute capability: 3.5)
编辑1
我注意到 p 中的一些参数没有在模型函数中使用。更改后,搜索仍然会停止。我已经编辑了上面的代码。
问题是我选择了 grid_downsample (0.01),这对于网格中 space 可能的排列来说太小了。如果 Talos 在与随机下采样相关的网格大小方面提供更多反馈,那就太好了。这是我结束的 Scan() 调用:
# Hyperparamter Search
experiment = talos.Scan(x=trainVectors,
y=trainLabels,
model=createNetworkAndFit,
grid_downsample=1,
params=p,
dataset_name='15000_talos',
experiment_no='1',
print_params=True,
disable_progress_bar=True,
clear_tf_session=True,
debug=True)