如何在带有列表的数据框中制作词云?
How to make a cloud of words in dataframe with lists?
在 Python 3 和 pandas 中,我有这个数据框 "proposicoes",其中有一列包含单词列表。该列名为 "ementa_token"
我想从专栏"ementa_token"中做一个词云。每行都有一个单词列表:
proposicoes[proposicoes['id'] == '465465']['ementa_token'].iloc[0]
['Comunica',
'Excelentíssimo',
'Senhor',
'Presidente',
'República',
'sanção',
'projeto',
'lei',
'Institui',
'Fundo',
'Nacional',
'Idoso',
'autoriza',
'deduzir',
'imposto',
'renda',
'devido',
'pessoas',
'físicas',
'jurídicas',
'doações',
'efetuadas',
'Fundos',
'Municipais',
'Estaduais',
'Nacional',
'Idoso',
'altera',
'Lei',
'nº',
'9250',
'26',
'dezembro',
'1995',
'restitui',
'arquivo',
'Congresso',
'Nacional',
'dois',
'autógrafos',
'texto',
'ora',
'convertido',
'Lei',
'nº',
'12213',
'20',
'janeiro',
'2010']
我这样试过:
from wordcloud import WordCloud
import matplotlib.pyplot as plt
%matplotlib inline
wordcloud = WordCloud(width=800, height=400).generate(proposicoes['ementa_token'])
plt.figure( figsize=(30,20) )
plt.imshow(wordcloud)
plt.axis("off")
plt.show()
我有这个错误:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-18-c072e91a9fe7> in <module>
----> 1 wordcloud = WordCloud(width=800, height=400).generate(proposicoes['ementa_token'])
2 plt.figure( figsize=(30,20) )
3 plt.imshow(wordcloud)
4 plt.axis("off")
5 plt.show()
c:\users\reinaldo\documents\code\palavras\lib\site-packages\wordcloud\wordcloud.py in generate(self, text)
603 self
604 """
--> 605 return self.generate_from_text(text)
606
607 def _check_generated(self):
c:\users\reinaldo\documents\code\palavras\lib\site-packages\wordcloud\wordcloud.py in generate_from_text(self, text)
584 self
585 """
--> 586 words = self.process_text(text)
587 self.generate_from_frequencies(words)
588 return self
c:\users\reinaldo\documents\code\palavras\lib\site-packages\wordcloud\wordcloud.py in process_text(self, text)
551 regexp = self.regexp if self.regexp is not None else r"\w[\w']+"
552
--> 553 words = re.findall(regexp, text, flags)
554 # remove stopwords
555 words = [word for word in words if word.lower() not in stopwords]
c:\users\reinaldo\documents\code\palavras\lib\re.py in findall(pattern, string, flags)
221
222 Empty matches are included in the result."""
--> 223 return _compile(pattern, flags).findall(string)
224
225 def finditer(pattern, string, flags=0):
TypeError: expected string or bytes-like object
这是否意味着代码没有读取每行列表中的单词?请问有人知道怎么做吗?
TypeError 很明显,WordCloud 需要一个字符串而不是一个系列。合并列中的列表,然后加入,
wordcloud = WordCloud(width=800, height=400).generate(' '.join(proposicoes['ementa_token'].sum())
选项 2:
data = ' '.join(np.concatenate(df.col2))
wordcloud = WordCloud(width=800, height=400).generate(' '.join(data)
在 Python 3 和 pandas 中,我有这个数据框 "proposicoes",其中有一列包含单词列表。该列名为 "ementa_token"
我想从专栏"ementa_token"中做一个词云。每行都有一个单词列表:
proposicoes[proposicoes['id'] == '465465']['ementa_token'].iloc[0]
['Comunica',
'Excelentíssimo',
'Senhor',
'Presidente',
'República',
'sanção',
'projeto',
'lei',
'Institui',
'Fundo',
'Nacional',
'Idoso',
'autoriza',
'deduzir',
'imposto',
'renda',
'devido',
'pessoas',
'físicas',
'jurídicas',
'doações',
'efetuadas',
'Fundos',
'Municipais',
'Estaduais',
'Nacional',
'Idoso',
'altera',
'Lei',
'nº',
'9250',
'26',
'dezembro',
'1995',
'restitui',
'arquivo',
'Congresso',
'Nacional',
'dois',
'autógrafos',
'texto',
'ora',
'convertido',
'Lei',
'nº',
'12213',
'20',
'janeiro',
'2010']
我这样试过:
from wordcloud import WordCloud
import matplotlib.pyplot as plt
%matplotlib inline
wordcloud = WordCloud(width=800, height=400).generate(proposicoes['ementa_token'])
plt.figure( figsize=(30,20) )
plt.imshow(wordcloud)
plt.axis("off")
plt.show()
我有这个错误:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-18-c072e91a9fe7> in <module>
----> 1 wordcloud = WordCloud(width=800, height=400).generate(proposicoes['ementa_token'])
2 plt.figure( figsize=(30,20) )
3 plt.imshow(wordcloud)
4 plt.axis("off")
5 plt.show()
c:\users\reinaldo\documents\code\palavras\lib\site-packages\wordcloud\wordcloud.py in generate(self, text)
603 self
604 """
--> 605 return self.generate_from_text(text)
606
607 def _check_generated(self):
c:\users\reinaldo\documents\code\palavras\lib\site-packages\wordcloud\wordcloud.py in generate_from_text(self, text)
584 self
585 """
--> 586 words = self.process_text(text)
587 self.generate_from_frequencies(words)
588 return self
c:\users\reinaldo\documents\code\palavras\lib\site-packages\wordcloud\wordcloud.py in process_text(self, text)
551 regexp = self.regexp if self.regexp is not None else r"\w[\w']+"
552
--> 553 words = re.findall(regexp, text, flags)
554 # remove stopwords
555 words = [word for word in words if word.lower() not in stopwords]
c:\users\reinaldo\documents\code\palavras\lib\re.py in findall(pattern, string, flags)
221
222 Empty matches are included in the result."""
--> 223 return _compile(pattern, flags).findall(string)
224
225 def finditer(pattern, string, flags=0):
TypeError: expected string or bytes-like object
这是否意味着代码没有读取每行列表中的单词?请问有人知道怎么做吗?
TypeError 很明显,WordCloud 需要一个字符串而不是一个系列。合并列中的列表,然后加入,
wordcloud = WordCloud(width=800, height=400).generate(' '.join(proposicoes['ementa_token'].sum())
选项 2:
data = ' '.join(np.concatenate(df.col2))
wordcloud = WordCloud(width=800, height=400).generate(' '.join(data)