有人可以解释我在使用 Python 构建机器学习系统 运行 文件 blei_lda.py 时遇到的不受支持的操作数错误吗?
Can someone explain the unsupported operand error I'm getting while running the file blei_lda.py from Building Machine Learning Systems with Python?
我一直在尝试 运行 使用 Python 构建机器学习系统一书第 4 章的文件 blei_lda.py,但没有成功。我正在使用带有 Enthought Canopy GUI 的 Python 2.7。以下是创作者提供的实际文件,但在 github.
上也有多个副本
问题是我不断收到此错误:
TypeError Traceback (most recent call last)
c:\users\matt\desktop\pythonprojects\pml\ch04\blei_lda.py in <module>()
for ti in range(model.num_topics):
words = model.show_topic(ti, 64)
------>tf = sum(f for f, w in words)
with open('topics.txt', 'w') as output:
output.write('\n'.join('{}:{}'.format(w, int(1000. * f / tf)) for f, w in words))
output.write("\n\n\n")
TypeError: unsupported operand type(s) for +: 'int' and 'unicode'
我已尝试创建一个解决方法,但无法找到任何完全有效的方法。
我也在整个网络和堆栈溢出中搜索了解决方案,但似乎我是唯一遇到此文件问题的人运行。
# This code is supporting material for the book
# Building Machine Learning Systems with Python
# by Willi Richert and Luis Pedro Coelho
# published by PACKT Publishing
#
# It is made available under the MIT License
from __future__ import print_function
from wordcloud import create_cloud
try:
from gensim import corpora, models, matutils
except:
print("import gensim failed.")
print()
print("Please install it")
raise
import matplotlib.pyplot as plt
import numpy as np
from os import path
NUM_TOPICS = 100
# Check that data exists
if not path.exists('./data/ap/ap.dat'):
print('Error: Expected data to be present at data/ap/')
print('Please cd into ./data & run ./download_ap.sh')
# Load the data
corpus = corpora.BleiCorpus('./data/ap/ap.dat', './data/ap/vocab.txt')
# Build the topic model
model = models.ldamodel.LdaModel(
corpus, num_topics=NUM_TOPICS, id2word=corpus.id2word, alpha=None)
# Iterate over all the topics in the model
for ti in range(model.num_topics):
words = model.show_topic(ti, 64)
tf = sum(f for f, w in words)
with open('topics.txt', 'w') as output:
output.write('\n'.join('{}:{}'.format(w, int(1000. * f / tf)) for f, w in words))
output.write("\n\n\n")
# We first identify the most discussed topic, i.e., the one with the
# highest total weight
topics = matutils.corpus2dense(model[corpus], num_terms=model.num_topics)
weight = topics.sum(1)
max_topic = weight.argmax()
# Get the top 64 words for this topic
# Without the argument, show_topic would return only 10 words
words = model.show_topic(max_topic, 64)
# This function will actually check for the presence of pytagcloud and is otherwise a no-op
create_cloud('cloud_blei_lda.png', words)
num_topics_used = [len(model[doc]) for doc in corpus]
fig,ax = plt.subplots()
ax.hist(num_topics_used, np.arange(42))
ax.set_ylabel('Nr of documents')
ax.set_xlabel('Nr of topics')
fig.tight_layout()
fig.savefig('Figure_04_01.png')
# Now, repeat the same exercise using alpha=1.0
# You can edit the constant below to play around with this parameter
ALPHA = 1.0
model1 = models.ldamodel.LdaModel(
corpus, num_topics=NUM_TOPICS, id2word=corpus.id2word, alpha=ALPHA)
num_topics_used1 = [len(model1[doc]) for doc in corpus]
fig,ax = plt.subplots()
ax.hist([num_topics_used, num_topics_used1], np.arange(42))
ax.set_ylabel('Nr of documents')
ax.set_xlabel('Nr of topics')
# The coordinates below were fit by trial and error to look good
ax.text(9, 223, r'default alpha')
ax.text(26, 156, 'alpha=1.0')
fig.tight_layout()
fig.savefig('Figure_04_02.png')
在这一行中:words = model.show_topic(ti, 64)
,words 是一个元组列表(unicode,float64)
例如。 [(u'school', 0.029515796999228502),(u'prom', 0.018586355008452897)]
所以在这一行中tf = sum(f for f, w in words)
f 表示unicode,而w 表示float 值。并且您正在尝试对给出不受支持的操作数类型错误的 unicode 值求和。
将此行修改为 tf = sum(f for w, f in words)
,因此它现在将对浮点值求和。
同样修改这一行output.write('\n'.join('{}:{}'.format(w, int(1000. * f / tf)) for w, f in words))
。
因此代码片段将如下所示:
for ti in range(model.num_topics):
words = model.show_topic(ti, 64)
tf = sum(f for w, f in words)
with open('topics.txt', 'w') as output:
output.write('\n'.join('{}:{}'.format(w, int(1000. * f / tf)) for w, f in words))
output.write("\n\n\n")
我一直在尝试 运行 使用 Python 构建机器学习系统一书第 4 章的文件 blei_lda.py,但没有成功。我正在使用带有 Enthought Canopy GUI 的 Python 2.7。以下是创作者提供的实际文件,但在 github.
上也有多个副本问题是我不断收到此错误:
TypeError Traceback (most recent call last)
c:\users\matt\desktop\pythonprojects\pml\ch04\blei_lda.py in <module>()
for ti in range(model.num_topics):
words = model.show_topic(ti, 64)
------>tf = sum(f for f, w in words)
with open('topics.txt', 'w') as output:
output.write('\n'.join('{}:{}'.format(w, int(1000. * f / tf)) for f, w in words))
output.write("\n\n\n")
TypeError: unsupported operand type(s) for +: 'int' and 'unicode'
我已尝试创建一个解决方法,但无法找到任何完全有效的方法。
我也在整个网络和堆栈溢出中搜索了解决方案,但似乎我是唯一遇到此文件问题的人运行。
# This code is supporting material for the book
# Building Machine Learning Systems with Python
# by Willi Richert and Luis Pedro Coelho
# published by PACKT Publishing
#
# It is made available under the MIT License
from __future__ import print_function
from wordcloud import create_cloud
try:
from gensim import corpora, models, matutils
except:
print("import gensim failed.")
print()
print("Please install it")
raise
import matplotlib.pyplot as plt
import numpy as np
from os import path
NUM_TOPICS = 100
# Check that data exists
if not path.exists('./data/ap/ap.dat'):
print('Error: Expected data to be present at data/ap/')
print('Please cd into ./data & run ./download_ap.sh')
# Load the data
corpus = corpora.BleiCorpus('./data/ap/ap.dat', './data/ap/vocab.txt')
# Build the topic model
model = models.ldamodel.LdaModel(
corpus, num_topics=NUM_TOPICS, id2word=corpus.id2word, alpha=None)
# Iterate over all the topics in the model
for ti in range(model.num_topics):
words = model.show_topic(ti, 64)
tf = sum(f for f, w in words)
with open('topics.txt', 'w') as output:
output.write('\n'.join('{}:{}'.format(w, int(1000. * f / tf)) for f, w in words))
output.write("\n\n\n")
# We first identify the most discussed topic, i.e., the one with the
# highest total weight
topics = matutils.corpus2dense(model[corpus], num_terms=model.num_topics)
weight = topics.sum(1)
max_topic = weight.argmax()
# Get the top 64 words for this topic
# Without the argument, show_topic would return only 10 words
words = model.show_topic(max_topic, 64)
# This function will actually check for the presence of pytagcloud and is otherwise a no-op
create_cloud('cloud_blei_lda.png', words)
num_topics_used = [len(model[doc]) for doc in corpus]
fig,ax = plt.subplots()
ax.hist(num_topics_used, np.arange(42))
ax.set_ylabel('Nr of documents')
ax.set_xlabel('Nr of topics')
fig.tight_layout()
fig.savefig('Figure_04_01.png')
# Now, repeat the same exercise using alpha=1.0
# You can edit the constant below to play around with this parameter
ALPHA = 1.0
model1 = models.ldamodel.LdaModel(
corpus, num_topics=NUM_TOPICS, id2word=corpus.id2word, alpha=ALPHA)
num_topics_used1 = [len(model1[doc]) for doc in corpus]
fig,ax = plt.subplots()
ax.hist([num_topics_used, num_topics_used1], np.arange(42))
ax.set_ylabel('Nr of documents')
ax.set_xlabel('Nr of topics')
# The coordinates below were fit by trial and error to look good
ax.text(9, 223, r'default alpha')
ax.text(26, 156, 'alpha=1.0')
fig.tight_layout()
fig.savefig('Figure_04_02.png')
在这一行中:words = model.show_topic(ti, 64)
,words 是一个元组列表(unicode,float64)
例如。 [(u'school', 0.029515796999228502),(u'prom', 0.018586355008452897)]
所以在这一行中tf = sum(f for f, w in words)
f 表示unicode,而w 表示float 值。并且您正在尝试对给出不受支持的操作数类型错误的 unicode 值求和。
将此行修改为 tf = sum(f for w, f in words)
,因此它现在将对浮点值求和。
同样修改这一行output.write('\n'.join('{}:{}'.format(w, int(1000. * f / tf)) for w, f in words))
。
因此代码片段将如下所示:
for ti in range(model.num_topics):
words = model.show_topic(ti, 64)
tf = sum(f for w, f in words)
with open('topics.txt', 'w') as output:
output.write('\n'.join('{}:{}'.format(w, int(1000. * f / tf)) for w, f in words))
output.write("\n\n\n")