生成词云以显示 Python 中数字的频率
Generate a word cloud to show frequenices of numbers in Python
我有一个 pandas 数据框,其中包含学生的成绩点。我想为成绩生成词云或数字云。有什么办法可以实现。我尝试了所有可能的方法,但我所有的努力都是徒劳的。基本上我想要的是其中包含数字的词云。来自 CGPA 列。
这是我尝试过的:
import pandas as pd
from wordcloud import WordCloud
import matplotlib.pyplot as plt
df = pd.read_csv("VTU_marks.csv")
# rounding off
df = df[df['CGPA'].isnull() == False]
df['CGPA'] = df['CGPA'].round(decimals=2)
wordcloud = WordCloud(max_font_size=50,max_words=100,background_color="white").generate(string)
plt.figure()
plt.imshow(wordcloud, interpolation="bilinear")
plt.axis("off")
plt.show()
但是我收到一个错误
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-47-29ec36ebbb1e> in <module>()
----> 1 wordcloud = WordCloud(max_font_size=50, max_words=100, background_color="white").generate(string)
2 plt.figure()
3 plt.imshow(wordcloud, interpolation="bilinear")
4 plt.axis("off")
5 plt.show()
/usr/local/lib/python3.6/dist-packages/wordcloud/wordcloud.py in generate(self, text)
603 self
604 """
--> 605 return self.generate_from_text(text)
606
607 def _check_generated(self):
/usr/local/lib/python3.6/dist-packages/wordcloud/wordcloud.py in generate_from_text(self, text)
585 """
586 words = self.process_text(text)
--> 587 self.generate_from_frequencies(words)
588 return self
589
/usr/local/lib/python3.6/dist-packages/wordcloud/wordcloud.py in generate_from_frequencies(self, frequencies, max_font_size)
381 if len(frequencies) <= 0:
382 raise ValueError("We need at least 1 word to plot a word cloud, "
--> 383 "got %d." % len(frequencies))
384 frequencies = frequencies[:self.max_words]
385
ValueError: We need at least 1 word to plot a word cloud, got 0.
您可以找到数据here。
根据需要设置数据并四舍五入后,我们可以计算每个分数的频率:
counts = df['CGPA'].value_counts()
我们需要确保这里的索引是字符串,浮点数会引发错误(这是您的示例尝试中的错误)。因此,我们可以将它们转换为字符串:
counts.index = counts.index.map(str)
#Below alternative works for pandas versions >= 0.19.0
#counts.index = counts.index.astype(str)
然后我们可以使用.generate_from_frequencies
方法得到你想要的:
wordcloud = WordCloud().generate_from_frequencies(counts)
plt.figure()
plt.imshow(wordcloud, interpolation="bilinear")
plt.axis("off")
plt.show()
这给了我以下信息:
完整的 MWE:
import pandas as pd
from wordcloud import WordCloud
import matplotlib.pyplot as plt
df = pd.read_csv("VTU_marks.csv")
# rounding off
df = df[df['CGPA'].isnull() == False]
df['CGPA'] = df['CGPA'].round(decimals=2)
counts = df['CGPA'].value_counts()
counts.index = counts.index.map(str)
#counts.index = counts.index.astype(str)
wordcloud = WordCloud().generate_from_frequencies(counts)
plt.figure()
plt.imshow(wordcloud, interpolation="bilinear")
plt.axis("off")
plt.show()
我有一个 pandas 数据框,其中包含学生的成绩点。我想为成绩生成词云或数字云。有什么办法可以实现。我尝试了所有可能的方法,但我所有的努力都是徒劳的。基本上我想要的是其中包含数字的词云。来自 CGPA 列。
这是我尝试过的:
import pandas as pd
from wordcloud import WordCloud
import matplotlib.pyplot as plt
df = pd.read_csv("VTU_marks.csv")
# rounding off
df = df[df['CGPA'].isnull() == False]
df['CGPA'] = df['CGPA'].round(decimals=2)
wordcloud = WordCloud(max_font_size=50,max_words=100,background_color="white").generate(string)
plt.figure()
plt.imshow(wordcloud, interpolation="bilinear")
plt.axis("off")
plt.show()
但是我收到一个错误
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-47-29ec36ebbb1e> in <module>()
----> 1 wordcloud = WordCloud(max_font_size=50, max_words=100, background_color="white").generate(string)
2 plt.figure()
3 plt.imshow(wordcloud, interpolation="bilinear")
4 plt.axis("off")
5 plt.show()
/usr/local/lib/python3.6/dist-packages/wordcloud/wordcloud.py in generate(self, text)
603 self
604 """
--> 605 return self.generate_from_text(text)
606
607 def _check_generated(self):
/usr/local/lib/python3.6/dist-packages/wordcloud/wordcloud.py in generate_from_text(self, text)
585 """
586 words = self.process_text(text)
--> 587 self.generate_from_frequencies(words)
588 return self
589
/usr/local/lib/python3.6/dist-packages/wordcloud/wordcloud.py in generate_from_frequencies(self, frequencies, max_font_size)
381 if len(frequencies) <= 0:
382 raise ValueError("We need at least 1 word to plot a word cloud, "
--> 383 "got %d." % len(frequencies))
384 frequencies = frequencies[:self.max_words]
385
ValueError: We need at least 1 word to plot a word cloud, got 0.
您可以找到数据here。
根据需要设置数据并四舍五入后,我们可以计算每个分数的频率:
counts = df['CGPA'].value_counts()
我们需要确保这里的索引是字符串,浮点数会引发错误(这是您的示例尝试中的错误)。因此,我们可以将它们转换为字符串:
counts.index = counts.index.map(str)
#Below alternative works for pandas versions >= 0.19.0
#counts.index = counts.index.astype(str)
然后我们可以使用.generate_from_frequencies
方法得到你想要的:
wordcloud = WordCloud().generate_from_frequencies(counts)
plt.figure()
plt.imshow(wordcloud, interpolation="bilinear")
plt.axis("off")
plt.show()
这给了我以下信息:
完整的 MWE:
import pandas as pd
from wordcloud import WordCloud
import matplotlib.pyplot as plt
df = pd.read_csv("VTU_marks.csv")
# rounding off
df = df[df['CGPA'].isnull() == False]
df['CGPA'] = df['CGPA'].round(decimals=2)
counts = df['CGPA'].value_counts()
counts.index = counts.index.map(str)
#counts.index = counts.index.astype(str)
wordcloud = WordCloud().generate_from_frequencies(counts)
plt.figure()
plt.imshow(wordcloud, interpolation="bilinear")
plt.axis("off")
plt.show()