如何在 Python 中显示来自数据框的 wordcloud
How to show wordcloud from a dataframe in Python
目前,我有一个包含单词和权重 (tf*idf) 的数据框,我想在 wordcloud 中显示按照权重排列的单词。
Dataframe 在左图上。
def generate_wordcloud(words_tem):
word_cloud = WordCloud(width = 512, height = 512, background_color='white', stopwords= None, max_words=20).generate(words_tem)
plt.figure(figsize=(10,8),facecolor = 'white', edgecolor='blue')
plt.imshow(word_cloud, interpolation='bilinear')
plt.axis('off')
plt.tight_layout(pad=0)
plt.show()
tfidf = TfidfVectorizer(data, lowercase = False)
tfs = tfidf.fit_transform([data])
feature_names = tfidf.get_feature_names()
df = pd.DataFrame(tfs.T.toarray(), index=feature_names, columns= ['weight'])
df = df.sort_values(by = 'weight', ascending = False)
word_lists = df.index.values
unique_str = ' '.join(word_lists)
print(df[0:20])
generate_wordcloud(unique_str)
最常用的软件包称为 wordcloud。看
https://github.com/amueller/word_cloud/blob/master/README.md
python -m pip install wordcloud
或康达
conda install -c conda-forge wordcloud
您可以执行以下操作:
从 wordcloud 导入 WordCloud
import matplotlib.pyplot as plt
% matplotlib inline # only if using notebooks
text = your_text_data
# Generate a word cloud image
wordcloud = WordCloud().generate(text)
# Display the generated image:
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.show()
与上面类似,您的流程不是文本,而是
# 从 TF-IDF 模型开始的步骤 from from gensim.models import TfidfModel 但你的也可以工作,因为我们只是制作一个元组 (term,weight).
tfidf = TfidfModel(vectors)
# Get TF-IDF weights
weights = tfidf[vectors[0]]
# Get terms from the dictionary and pair with weights
weights = [(dictionary[pair[0]], pair[1]) for pair in weights]
# Generate the cloud
wc = WordCloud()
wc.generate_from_frequencies(weights)
...
目前,我有一个包含单词和权重 (tf*idf) 的数据框,我想在 wordcloud 中显示按照权重排列的单词。
Dataframe 在左图上。
def generate_wordcloud(words_tem):
word_cloud = WordCloud(width = 512, height = 512, background_color='white', stopwords= None, max_words=20).generate(words_tem)
plt.figure(figsize=(10,8),facecolor = 'white', edgecolor='blue')
plt.imshow(word_cloud, interpolation='bilinear')
plt.axis('off')
plt.tight_layout(pad=0)
plt.show()
tfidf = TfidfVectorizer(data, lowercase = False)
tfs = tfidf.fit_transform([data])
feature_names = tfidf.get_feature_names()
df = pd.DataFrame(tfs.T.toarray(), index=feature_names, columns= ['weight'])
df = df.sort_values(by = 'weight', ascending = False)
word_lists = df.index.values
unique_str = ' '.join(word_lists)
print(df[0:20])
generate_wordcloud(unique_str)
最常用的软件包称为 wordcloud。看 https://github.com/amueller/word_cloud/blob/master/README.md
python -m pip install wordcloud
或康达
conda install -c conda-forge wordcloud
您可以执行以下操作: 从 wordcloud 导入 WordCloud
import matplotlib.pyplot as plt
% matplotlib inline # only if using notebooks
text = your_text_data
# Generate a word cloud image
wordcloud = WordCloud().generate(text)
# Display the generated image:
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.show()
与上面类似,您的流程不是文本,而是 # 从 TF-IDF 模型开始的步骤 from from gensim.models import TfidfModel 但你的也可以工作,因为我们只是制作一个元组 (term,weight).
tfidf = TfidfModel(vectors)
# Get TF-IDF weights
weights = tfidf[vectors[0]]
# Get terms from the dictionary and pair with weights
weights = [(dictionary[pair[0]], pair[1]) for pair in weights]
# Generate the cloud
wc = WordCloud()
wc.generate_from_frequencies(weights)
...