tf.train.range_input_producer(epoch_size, shuffle=True) 不会终止也不会引发 CPU/GPU 加载

Question

我在处理 RNN 时遇到了一个奇怪的问题。我正在关注 TensorFlow RNN Tutorial and trying my own (simpler) implementation which is quite inspired by R2RT's Blog Post: Recurrent Neural Networks in Tensorflow I .

调试后我确定问题出在 tensorflow.models.rnn.ptb.reader.py 中的 ranger_input_producer（第 115 行）。

我把它隔离在最小的例子中：

import tensorflow as tf

epoch_size = 20
i = tf.train.range_input_producer(epoch_size, shuffle=False).dequeue()

这就是 ptb_producer（epoch_size 的变量值）。事实证明，这段代码并没有终止（我什至没有调用任何 session.run(...) 也没有使用 CPU。我猜队列正在等待。

有什么线索吗？谢谢

pltrdy

Answer 1

由于队列为空，您可能遇到出队 I/O 阻塞。（ptb_producer 使用 tf.train.range_input_producer，它使用 FIFOQueue。）根据 documentation，队列将阻塞直到 return 有一个元素。请仔细检查您的目录和数据。

Answer 2

如果你只是使用代码 with tf.Session() as sess:, 你必须明确地打开线程 threads = tf.train.start_queue_runners()。但是在ptb_word_lm.py中，它使用这样的代码 sv = tf.train.Supervisor() with sv.managed_session() as sess:, Supervisor() 函数包含一些隐式启动线程的东西

tf.train.range_input_producer(epoch_size, shuffle=True) 不会终止也不会引发 CPU/GPU 加载

tf.train.range_input_producer(epoch_size, shuffle=True) does not terminate nor induce CPU/GPU load

python

tensorflow

recurrent-neural-network

word-embedding