GPU 上的 TensorFlow Word2Vec 模型 运行
TensorFlow Word2Vec model running on GPU
在this TensorFlow example中描述了一种训练skip-gram的Word2Vec模型。它包含以下代码片段,明确需要 CPU 设备进行计算,即 tf.device('/cpu:0')
:
batch_size = 128
embedding_size = 128 # Dimension of the embedding vector.
skip_window = 1 # How many words to consider left and right.
num_skips = 2 # How many times to reuse an input to generate a label.
# We pick a random validation set to sample nearest neighbors. Here we limit the
# validation samples to the words that have a low numeric ID, which by
# construction are also the most frequent.
valid_size = 16 # Random set of words to evaluate similarity on.
valid_window = 100 # Only pick dev samples in the head of the distribution.
valid_examples = np.array(random.sample(range(valid_window), valid_size))
num_sampled = 64 # Number of negative examples to sample.
graph = tf.Graph()
with graph.as_default(), tf.device('/cpu:0'):
# Input data.
train_dataset = tf.placeholder(tf.int32, shape=[batch_size])
train_labels = tf.placeholder(tf.int32, shape=[batch_size, 1])
valid_dataset = tf.constant(valid_examples, dtype=tf.int32)
# Variables.
embeddings = tf.Variable(
tf.random_uniform([vocabulary_size, embedding_size], -1.0, 1.0))
softmax_weights = tf.Variable(
tf.truncated_normal([vocabulary_size, embedding_size],
stddev=1.0 / math.sqrt(embedding_size)))
softmax_biases = tf.Variable(tf.zeros([vocabulary_size]))
# Model.
# Look up embeddings for inputs.
embed = tf.nn.embedding_lookup(embeddings, train_dataset)
# Compute the softmax loss, using a sample of the negative labels each time.
loss = tf.reduce_mean(
tf.nn.sampled_softmax_loss(weights=softmax_weights,
biases=softmax_biases, inputs=embed,
labels=train_labels, num_sampled=num_sampled,
num_classes=vocabulary_size))
# Optimizer.
# Note: The optimizer will optimize the softmax_weights AND the embeddings.
# This is because the embeddings are defined as a variable quantity and the
# optimizer's `minimize` method will by default modify all variable quantities
# that contribute to the tensor it is passed.
# See docs on `tf.train.Optimizer.minimize()` for more details.
optimizer = tf.train.AdagradOptimizer(1.0).minimize(loss)
# Compute the similarity between minibatch examples and all embeddings.
# We use the cosine distance:
norm = tf.sqrt(tf.reduce_sum(tf.square(embeddings), 1, keep_dims=True))
normalized_embeddings = embeddings / norm
valid_embeddings = tf.nn.embedding_lookup(normalized_embeddings, valid_dataset)
similarity = tf.matmul(valid_embeddings, tf.transpose(normalized_embeddings))
尝试切换到 GPU 时,出现以下异常:
InvalidArgumentError (see above for traceback): Cannot assign a device for operation 'Variable_2/Adagrad': Could not satisfy explicit device specification '/device:GPU:0' because no supported kernel for GPU devices is available.
我想知道提供的图无法在GPU上计算的原因是什么?是否由于 tf.int32
类型而发生?或者我应该切换到另一个优化器?换句话说,有什么办法可以在 GPU 上处理 Word2Vec 模型吗? (没有类型转换)。
更新
根据 Akshay Agrawal 的建议,这里是获得所需结果的原始代码的更新片段:
with graph.as_default(), tf.device('/gpu:0'):
# Input data.
train_dataset = tf.placeholder(tf.int32, shape=[batch_size])
train_labels = tf.placeholder(tf.int32, shape=[batch_size, 1])
valid_dataset = tf.constant(valid_examples, dtype=tf.int32)
embeddings = tf.Variable(
tf.random_uniform([vocabulary_size, embedding_size], -1.0, 1.0))
softmax_weights = tf.Variable(
tf.truncated_normal([vocabulary_size, embedding_size],
stddev=1.0 / math.sqrt(embedding_size)))
softmax_biases = tf.Variable(tf.zeros([vocabulary_size]))
embed = tf.nn.embedding_lookup(embeddings, train_dataset)
with tf.device('/cpu:0'):
loss = tf.reduce_mean(
tf.nn.sampled_softmax_loss(weights=softmax_weights,
biases=softmax_biases,
inputs=embed,
labels=train_labels,
num_sampled=num_sampled,
num_classes=vocabulary_size))
optimizer = tf.train.AdamOptimizer(0.001).minimize(loss)
norm = tf.sqrt(tf.reduce_sum(tf.square(embeddings), 1, keep_dims=True))
normalized_embeddings = embeddings / norm
valid_embeddings = tf.nn.embedding_lookup(normalized_embeddings, valid_dataset)
similarity = tf.matmul(valid_embeddings, tf.transpose(normalized_embeddings))
出现错误是因为AdagradOptimizer
没有用于其稀疏应用操作的 GPU 内核;稀疏应用被触发,因为通过嵌入查找进行区分会导致稀疏梯度。
GradientDescentOptimizer
和 AdamOptimizer
支持稀疏应用操作。如果您要切换到这些优化器之一,您会不幸地看到另一个错误:tf.nn.sampled_softmax_loss 似乎创建了一个没有 GPU 内核的操作。为了解决这个问题,您可以用 with tf.device('/cpu:0'):
上下文包装 loss = tf.reduce_mean(...
行,尽管这样做会引入 cpu-gpu 通信开销。
在this TensorFlow example中描述了一种训练skip-gram的Word2Vec模型。它包含以下代码片段,明确需要 CPU 设备进行计算,即 tf.device('/cpu:0')
:
batch_size = 128
embedding_size = 128 # Dimension of the embedding vector.
skip_window = 1 # How many words to consider left and right.
num_skips = 2 # How many times to reuse an input to generate a label.
# We pick a random validation set to sample nearest neighbors. Here we limit the
# validation samples to the words that have a low numeric ID, which by
# construction are also the most frequent.
valid_size = 16 # Random set of words to evaluate similarity on.
valid_window = 100 # Only pick dev samples in the head of the distribution.
valid_examples = np.array(random.sample(range(valid_window), valid_size))
num_sampled = 64 # Number of negative examples to sample.
graph = tf.Graph()
with graph.as_default(), tf.device('/cpu:0'):
# Input data.
train_dataset = tf.placeholder(tf.int32, shape=[batch_size])
train_labels = tf.placeholder(tf.int32, shape=[batch_size, 1])
valid_dataset = tf.constant(valid_examples, dtype=tf.int32)
# Variables.
embeddings = tf.Variable(
tf.random_uniform([vocabulary_size, embedding_size], -1.0, 1.0))
softmax_weights = tf.Variable(
tf.truncated_normal([vocabulary_size, embedding_size],
stddev=1.0 / math.sqrt(embedding_size)))
softmax_biases = tf.Variable(tf.zeros([vocabulary_size]))
# Model.
# Look up embeddings for inputs.
embed = tf.nn.embedding_lookup(embeddings, train_dataset)
# Compute the softmax loss, using a sample of the negative labels each time.
loss = tf.reduce_mean(
tf.nn.sampled_softmax_loss(weights=softmax_weights,
biases=softmax_biases, inputs=embed,
labels=train_labels, num_sampled=num_sampled,
num_classes=vocabulary_size))
# Optimizer.
# Note: The optimizer will optimize the softmax_weights AND the embeddings.
# This is because the embeddings are defined as a variable quantity and the
# optimizer's `minimize` method will by default modify all variable quantities
# that contribute to the tensor it is passed.
# See docs on `tf.train.Optimizer.minimize()` for more details.
optimizer = tf.train.AdagradOptimizer(1.0).minimize(loss)
# Compute the similarity between minibatch examples and all embeddings.
# We use the cosine distance:
norm = tf.sqrt(tf.reduce_sum(tf.square(embeddings), 1, keep_dims=True))
normalized_embeddings = embeddings / norm
valid_embeddings = tf.nn.embedding_lookup(normalized_embeddings, valid_dataset)
similarity = tf.matmul(valid_embeddings, tf.transpose(normalized_embeddings))
尝试切换到 GPU 时,出现以下异常:
InvalidArgumentError (see above for traceback): Cannot assign a device for operation 'Variable_2/Adagrad': Could not satisfy explicit device specification '/device:GPU:0' because no supported kernel for GPU devices is available.
我想知道提供的图无法在GPU上计算的原因是什么?是否由于 tf.int32
类型而发生?或者我应该切换到另一个优化器?换句话说,有什么办法可以在 GPU 上处理 Word2Vec 模型吗? (没有类型转换)。
更新
根据 Akshay Agrawal 的建议,这里是获得所需结果的原始代码的更新片段:
with graph.as_default(), tf.device('/gpu:0'):
# Input data.
train_dataset = tf.placeholder(tf.int32, shape=[batch_size])
train_labels = tf.placeholder(tf.int32, shape=[batch_size, 1])
valid_dataset = tf.constant(valid_examples, dtype=tf.int32)
embeddings = tf.Variable(
tf.random_uniform([vocabulary_size, embedding_size], -1.0, 1.0))
softmax_weights = tf.Variable(
tf.truncated_normal([vocabulary_size, embedding_size],
stddev=1.0 / math.sqrt(embedding_size)))
softmax_biases = tf.Variable(tf.zeros([vocabulary_size]))
embed = tf.nn.embedding_lookup(embeddings, train_dataset)
with tf.device('/cpu:0'):
loss = tf.reduce_mean(
tf.nn.sampled_softmax_loss(weights=softmax_weights,
biases=softmax_biases,
inputs=embed,
labels=train_labels,
num_sampled=num_sampled,
num_classes=vocabulary_size))
optimizer = tf.train.AdamOptimizer(0.001).minimize(loss)
norm = tf.sqrt(tf.reduce_sum(tf.square(embeddings), 1, keep_dims=True))
normalized_embeddings = embeddings / norm
valid_embeddings = tf.nn.embedding_lookup(normalized_embeddings, valid_dataset)
similarity = tf.matmul(valid_embeddings, tf.transpose(normalized_embeddings))
出现错误是因为AdagradOptimizer
没有用于其稀疏应用操作的 GPU 内核;稀疏应用被触发,因为通过嵌入查找进行区分会导致稀疏梯度。
GradientDescentOptimizer
和 AdamOptimizer
支持稀疏应用操作。如果您要切换到这些优化器之一,您会不幸地看到另一个错误:tf.nn.sampled_softmax_loss 似乎创建了一个没有 GPU 内核的操作。为了解决这个问题,您可以用 with tf.device('/cpu:0'):
上下文包装 loss = tf.reduce_mean(...
行,尽管这样做会引入 cpu-gpu 通信开销。