word2vec_basic 不工作(张量流)
word2vec_basic not working (Tensorflow)
我是词嵌入和 Tensorflow 的新手。我正在做一个项目,我需要将 word2vec 应用于健康数据。
我使用了 Tensorflow 网站的代码 (word2vec_basic.py
)。我稍微修改了这段代码,让它读取我的数据而不是 "text8.zip" 并且它正常运行直到最后一步:
num_steps = 100001
with tf.Session(graph=graph) as session:
# We must initialize all variables before we use them.
tf.initialize_all_variables().run()
print('Initialized')
average_loss = 0
for step in range(num_steps):
batch_data, batch_labels = generate_batch(
batch_size, num_skips, skip_window)
feed_dict = {train_dataset : batch_data, train_labels : batch_labels}
_, l = session.run([optimizer, loss], feed_dict=feed_dict)
average_loss += l
if step % 2000 == 0:
if step > 0:
average_loss = average_loss / 2000
# The average loss is an estimate of the loss over the last 2000 batches.
print('Average loss at step %d: %f' % (step, average_loss))
average_loss = 0
# note that this is expensive (~20% slowdown if computed every 500 steps)
if step % 10000 == 0:
sim = similarity.eval()
for i in range(valid_size):
valid_word = reverse_dictionary[valid_examples[i]]
top_k = 8 # number of nearest neighbors
nearest = (-sim[i, :]).argsort()[1:top_k+1]
log = 'Nearest to %s:' % valid_word
for k in range(top_k):
close_word = reverse_dictionary[nearest[k]]
log = '%s %s,' % (log, close_word)
print(log)
final_embeddings = normalized_embeddings.eval()<code>
这段代码和例子完全一样,所以我认为没有错。它给出的错误是:
KeyError Traceback (most recent call last)
<ipython-input-20-fc4c5c915fc6> in <module>()
34 for k in xrange(top_k):
35 print(nearest[k])
---> 36 close_word = reverse_dictionary[nearest[k]]
37 log_str = "%s %s," % (log_str, close_word)
38 print(log_str)
KeyError: 2868
我更改了输入数据的大小,但仍然出现相同的错误。
如果有人能给我一些解决此问题的建议,我将不胜感激。
如果词汇量小于默认的最大值 (50000),您应该修改数量。
在第 2 步的最后,我们将 vocabulary_size 修改为实际字典大小。
data, count, dictionary, reverse_dictionary = build_dataset(words)
del words # Hint to reduce memory.
print('Most common words (+UNK)', count[:5])
print('Sample data', data[:10], [reverse_dictionary[i] for i in data[:10]])
#add this line to modify
vocabulary_size = len(dictionary)
print('Dictionary size', len(dictionary))
我是词嵌入和 Tensorflow 的新手。我正在做一个项目,我需要将 word2vec 应用于健康数据。
我使用了 Tensorflow 网站的代码 (word2vec_basic.py
)。我稍微修改了这段代码,让它读取我的数据而不是 "text8.zip" 并且它正常运行直到最后一步:
num_steps = 100001
with tf.Session(graph=graph) as session:
# We must initialize all variables before we use them.
tf.initialize_all_variables().run()
print('Initialized')
average_loss = 0
for step in range(num_steps):
batch_data, batch_labels = generate_batch(
batch_size, num_skips, skip_window)
feed_dict = {train_dataset : batch_data, train_labels : batch_labels}
_, l = session.run([optimizer, loss], feed_dict=feed_dict)
average_loss += l
if step % 2000 == 0:
if step > 0:
average_loss = average_loss / 2000
# The average loss is an estimate of the loss over the last 2000 batches.
print('Average loss at step %d: %f' % (step, average_loss))
average_loss = 0
# note that this is expensive (~20% slowdown if computed every 500 steps)
if step % 10000 == 0:
sim = similarity.eval()
for i in range(valid_size):
valid_word = reverse_dictionary[valid_examples[i]]
top_k = 8 # number of nearest neighbors
nearest = (-sim[i, :]).argsort()[1:top_k+1]
log = 'Nearest to %s:' % valid_word
for k in range(top_k):
close_word = reverse_dictionary[nearest[k]]
log = '%s %s,' % (log, close_word)
print(log)
final_embeddings = normalized_embeddings.eval()<code>
这段代码和例子完全一样,所以我认为没有错。它给出的错误是:
KeyError Traceback (most recent call last)
<ipython-input-20-fc4c5c915fc6> in <module>()
34 for k in xrange(top_k):
35 print(nearest[k])
---> 36 close_word = reverse_dictionary[nearest[k]]
37 log_str = "%s %s," % (log_str, close_word)
38 print(log_str)
KeyError: 2868
我更改了输入数据的大小,但仍然出现相同的错误。
如果有人能给我一些解决此问题的建议,我将不胜感激。
如果词汇量小于默认的最大值 (50000),您应该修改数量。
在第 2 步的最后,我们将 vocabulary_size 修改为实际字典大小。
data, count, dictionary, reverse_dictionary = build_dataset(words)
del words # Hint to reduce memory.
print('Most common words (+UNK)', count[:5])
print('Sample data', data[:10], [reverse_dictionary[i] for i in data[:10]])
#add this line to modify
vocabulary_size = len(dictionary)
print('Dictionary size', len(dictionary))