Jupyter 上的 TensorFlow:无法恢复变量
TensorFlow on Jupyter: Can't restore variables
在 Jupyter 笔记本中使用 TensorFlow 时,我似乎无法恢复保存的变量。我训练一个 ANN,然后我 运行 saver.save(sess, "params1.ckpt")
然后我再次训练它,保存新结果 saver.save(sess, "params2.ckpt")
但是当我 运行 saver.restore(sess, "params1.ckpt")
我的模型没有加载保存在 params1.ckpt
上的值并将它们保存在 params2.ckpt
.
中
如果我 运行 模型,将其保存在 params.ckpt
,然后关闭并暂停,然后尝试再次加载它,我收到以下错误:
---------------------------------------------------------------------------
StatusNotOK Traceback (most recent call last)
StatusNotOK: Not found: Tensor name "Variable/Adam" not found in checkpoint files params.ckpt
[[Node: save/restore_slice_1 = RestoreSlice[dt=DT_FLOAT, preferred_shard=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save/Const_0, save/restore_slice_1/tensor_name, save/restore_slice_1/shape_and_slice)]]
During handling of the above exception, another exception occurred:
SystemError Traceback (most recent call last)
<ipython-input-6-39ae6b7641bd> in <module>()
----> 1 saver.restore(sess, "params.ckpt")
/usr/local/lib/python3.5/site-packages/tensorflow/python/training/saver.py in restore(self, sess, save_path)
889 save_path: Path where parameters were previously saved.
890 """
--> 891 sess.run([self._restore_op_name], {self._filename_tensor_name: save_path})
892
893
/usr/local/lib/python3.5/site-packages/tensorflow/python/client/session.py in run(self, fetches, feed_dict)
366
367 # Run request and get response.
--> 368 results = self._do_run(target_list, unique_fetch_targets, feed_dict_string)
369
370 # User may have fetched the same tensor multiple times, but we
/usr/local/lib/python3.5/site-packages/tensorflow/python/client/session.py in _do_run(self, target_list, fetch_list, feed_dict)
426
427 return tf_session.TF_Run(self._session, feed_dict, fetch_list,
--> 428 target_list)
429
430 except tf_session.StatusNotOK as e:
SystemError: <built-in function delete_Status> returned a result with an error set
我的训练代码是:
def weight_variable(shape, name):
initial = tf.truncated_normal(shape, stddev=1.0, name=name)
return tf.Variable(initial)
def bias_variable(shape, name):
initial = tf.constant(1.0, shape=shape)
return tf.Variable(initial, name=name)
input_file = pd.read_csv('P2R0PC0.csv')
features = #vector with 5 feature names
targets = #vector with 4 feature names
x_data = input_file.as_matrix(features)
t_data = input_file.as_matrix(targets)
x = tf.placeholder(tf.float32, [None, x_data.shape[1]])
hiddenDim = 5
b1 = bias_variable([hiddenDim], name = "b1")
W1 = weight_variable([x_data.shape[1], hiddenDim], name = "W1")
b2 = bias_variable([t_data.shape[1]], name = "b2")
W2 = weight_variable([hiddenDim, t_data.shape[1]], name = "W2")
hidden = tf.nn.sigmoid(tf.matmul(x, W1) + b1)
y = tf.nn.sigmoid(tf.matmul(hidden, W2) + b2)
t = tf.placeholder(tf.float32, [None, t_data.shape[1]])
lambda1 = 1
beta1 = 1
lambda2 = 1
beta2 = 1
error = -tf.reduce_sum(t * tf.log(tf.clip_by_value(y,1e-10,1.0)) + (1 - t) * tf.log(tf.clip_by_value(1 - y,1e-10,1.0)))
complexity = lambda1 * tf.nn.l2_loss(W1) + beta1 * tf.nn.l2_loss(b1) + lambda2 * tf.nn.l2_loss(W2) + beta2 * tf.nn.l2_loss(b2)
loss = error + complexity
train_step = tf.train.AdamOptimizer(0.001).minimize(loss)
sess = tf.Session()
init = tf.initialize_all_variables()
sess.run(init)
ran = 25001
delta = 250
plot_data = np.zeros(int(ran / delta + 1))
k = 0;
for i in range(ran):
train_step.run({x: data, t: labels}, sess)
if i % delta == 0:
plot_data[k] = loss.eval({x: data, t: labels}, sess)
#plot_training[k] = loss.eval({x: x_test, t: t_test}, sess)
print(str(plot_data[k]))
k = k + 1
plt.plot(np.arange(start=2, stop=int(ran / delta + 1)), plot_data[2:])
saver = tf.train.Saver()
saver.save(sess, "params.ckpt")
error.eval({x:data, t: labels}, session=sess)
我做错了什么吗?为什么我不能恢复我的变量?
您似乎正在使用 Jupyter 构建模型。一个可能的问题是,在构建 tf.Saver
with the default arguments is that it will use the (auto-generated) names for the variables as the keys in your checkpoint. Since in Jupyter its easy to re-execute code cells multiple times, you might be ending up with multiple copies of the variable nodes in the session that you save. See 以解释可能出错的地方时。
有几个可能的解决方案。这是最简单的:
在构建模型(和 Saver
)之前调用 tf.reset_default_graph()
。这将确保变量获得您想要的名称,但会使 previously-created 图无效。
使用 tf.train.Saver()
的显式参数来指定变量的持久名称。对于您的示例,这应该不会太难(尽管对于较大的模型来说它变得笨拙):
saver = tf.train.Saver(var_list={"b1": b1, "W1": W1, "b2": b2, "W2": W2})
创建一个新的 tf.Graph()
并在每次创建模型时将其设为默认值。这在 Jupyter 中可能很棘手,因为它迫使您将所有模型构建代码放在一个单元格中,但它适用于脚本:
with tf.Graph().as_default():
# Model building and training/evaluation code goes here.
在 Jupyter 笔记本中使用 TensorFlow 时,我似乎无法恢复保存的变量。我训练一个 ANN,然后我 运行 saver.save(sess, "params1.ckpt")
然后我再次训练它,保存新结果 saver.save(sess, "params2.ckpt")
但是当我 运行 saver.restore(sess, "params1.ckpt")
我的模型没有加载保存在 params1.ckpt
上的值并将它们保存在 params2.ckpt
.
如果我 运行 模型,将其保存在 params.ckpt
,然后关闭并暂停,然后尝试再次加载它,我收到以下错误:
---------------------------------------------------------------------------
StatusNotOK Traceback (most recent call last)
StatusNotOK: Not found: Tensor name "Variable/Adam" not found in checkpoint files params.ckpt
[[Node: save/restore_slice_1 = RestoreSlice[dt=DT_FLOAT, preferred_shard=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save/Const_0, save/restore_slice_1/tensor_name, save/restore_slice_1/shape_and_slice)]]
During handling of the above exception, another exception occurred:
SystemError Traceback (most recent call last)
<ipython-input-6-39ae6b7641bd> in <module>()
----> 1 saver.restore(sess, "params.ckpt")
/usr/local/lib/python3.5/site-packages/tensorflow/python/training/saver.py in restore(self, sess, save_path)
889 save_path: Path where parameters were previously saved.
890 """
--> 891 sess.run([self._restore_op_name], {self._filename_tensor_name: save_path})
892
893
/usr/local/lib/python3.5/site-packages/tensorflow/python/client/session.py in run(self, fetches, feed_dict)
366
367 # Run request and get response.
--> 368 results = self._do_run(target_list, unique_fetch_targets, feed_dict_string)
369
370 # User may have fetched the same tensor multiple times, but we
/usr/local/lib/python3.5/site-packages/tensorflow/python/client/session.py in _do_run(self, target_list, fetch_list, feed_dict)
426
427 return tf_session.TF_Run(self._session, feed_dict, fetch_list,
--> 428 target_list)
429
430 except tf_session.StatusNotOK as e:
SystemError: <built-in function delete_Status> returned a result with an error set
我的训练代码是:
def weight_variable(shape, name):
initial = tf.truncated_normal(shape, stddev=1.0, name=name)
return tf.Variable(initial)
def bias_variable(shape, name):
initial = tf.constant(1.0, shape=shape)
return tf.Variable(initial, name=name)
input_file = pd.read_csv('P2R0PC0.csv')
features = #vector with 5 feature names
targets = #vector with 4 feature names
x_data = input_file.as_matrix(features)
t_data = input_file.as_matrix(targets)
x = tf.placeholder(tf.float32, [None, x_data.shape[1]])
hiddenDim = 5
b1 = bias_variable([hiddenDim], name = "b1")
W1 = weight_variable([x_data.shape[1], hiddenDim], name = "W1")
b2 = bias_variable([t_data.shape[1]], name = "b2")
W2 = weight_variable([hiddenDim, t_data.shape[1]], name = "W2")
hidden = tf.nn.sigmoid(tf.matmul(x, W1) + b1)
y = tf.nn.sigmoid(tf.matmul(hidden, W2) + b2)
t = tf.placeholder(tf.float32, [None, t_data.shape[1]])
lambda1 = 1
beta1 = 1
lambda2 = 1
beta2 = 1
error = -tf.reduce_sum(t * tf.log(tf.clip_by_value(y,1e-10,1.0)) + (1 - t) * tf.log(tf.clip_by_value(1 - y,1e-10,1.0)))
complexity = lambda1 * tf.nn.l2_loss(W1) + beta1 * tf.nn.l2_loss(b1) + lambda2 * tf.nn.l2_loss(W2) + beta2 * tf.nn.l2_loss(b2)
loss = error + complexity
train_step = tf.train.AdamOptimizer(0.001).minimize(loss)
sess = tf.Session()
init = tf.initialize_all_variables()
sess.run(init)
ran = 25001
delta = 250
plot_data = np.zeros(int(ran / delta + 1))
k = 0;
for i in range(ran):
train_step.run({x: data, t: labels}, sess)
if i % delta == 0:
plot_data[k] = loss.eval({x: data, t: labels}, sess)
#plot_training[k] = loss.eval({x: x_test, t: t_test}, sess)
print(str(plot_data[k]))
k = k + 1
plt.plot(np.arange(start=2, stop=int(ran / delta + 1)), plot_data[2:])
saver = tf.train.Saver()
saver.save(sess, "params.ckpt")
error.eval({x:data, t: labels}, session=sess)
我做错了什么吗?为什么我不能恢复我的变量?
您似乎正在使用 Jupyter 构建模型。一个可能的问题是,在构建 tf.Saver
with the default arguments is that it will use the (auto-generated) names for the variables as the keys in your checkpoint. Since in Jupyter its easy to re-execute code cells multiple times, you might be ending up with multiple copies of the variable nodes in the session that you save. See
有几个可能的解决方案。这是最简单的:
在构建模型(和
Saver
)之前调用tf.reset_default_graph()
。这将确保变量获得您想要的名称,但会使 previously-created 图无效。使用
tf.train.Saver()
的显式参数来指定变量的持久名称。对于您的示例,这应该不会太难(尽管对于较大的模型来说它变得笨拙):saver = tf.train.Saver(var_list={"b1": b1, "W1": W1, "b2": b2, "W2": W2})
创建一个新的
tf.Graph()
并在每次创建模型时将其设为默认值。这在 Jupyter 中可能很棘手,因为它迫使您将所有模型构建代码放在一个单元格中,但它适用于脚本:with tf.Graph().as_default(): # Model building and training/evaluation code goes here.