绘制没有停用词的词云
plot Word cloud without stopwords
我正在寻找使用我的 pandas 数据框中的列绘制 Wordcloud
这是我的代码:
all_words=''.join( [tweet for tweet in tweet_table['tokens'] ] )
word_Cloud=WordCloud(width=500, height=300, random_state=21, max_font_size=119).generate(all_words)
plt.imshow(word_Cloud, interpolation='bilinear')
我要绘制的列 tweet_table['tokens']
如下所示:
0 [da, trumpanzee, follower, blm, balance, wp, g...
1 [counting, blacklivesmatter, received, trainin...
2 [okay, like, little, kids, pretty, smart, know...
3 [thank, oscopelabs, got, mounted, loud, amp, p...
4 [bpi, proud, supported, hoops, 4l, f, e, see, ...
...
44713 [tomorrow, buy, charity, compilation, undergro...
44714 [needs, erected, state, capitol, think, darkfa...
44715 [clay, county, sheriffs, motto, screw, amp, in...
44716 [films, eleven, films, bravo, bad, ass, video,...
44717 [everybody, give, listen, blm]
我上面的代码给出了以下错误:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-227-4066d6d1a153> in <module>
2 # REMOVE STOP WORDS
3
----> 4 all_words=''.join( [tweet for tweet in tweet_table['tokens'] ] )
TypeError: sequence item 0: expected str instance, list found
请问如何解决这个错误? tweet_table['token']
列是 tokenized
并清除任何 stopwords
非常感谢
Ps:当我为此专栏使用类似代码时 tweet_table['clean_text']
代码工作正常。
列 tweet_table['clean_text']
如下所示:
0 You have a da trumpanzee follower in ...
1 Over 279 and counting If BlackLivesMatte...
2 Okay but like little kids are pretty smart and...
3 Thank you oscopelabs got it mounted loud amp...
4 BPI is proud to have supported Hoops4L Y F E ...
...
44713 TOMORROW you can buy the charity compilation...
44714 That needs to be erected at the State Capi...
44715 Clay County Sheriffs Motto To Screw amp ...
44716 Films Eleven Films bravo Bad ass vid...
44717 everybody should give this a listen ...
我刚刚修好了
allwords=''.join( str(tweet_table['tokens']))
word_Cloud=WordCloud(width=500, height=300, random_state=21,
max_font_size=119).generate(allwords)
plt.imshow(word_Cloud, interpolation='bilinear')
其中 tweet_table['tokens']
没有任何停用词。否则,我们创建一个停用词列表并将其添加为下面的代码
from wordcloud import WordCloud,STOPWORDS
stopwords_newlist = ["https", "co"] + list(STOPWORDS)
allwords=''.join( str(tweet_table['tokens']))
word_Cloud=WordCloud(width=500, height=300, random_state=21, stopwords=stopwords_newlist,
max_font_size=119).generate(allwords)
plt.imshow(word_Cloud, interpolation='bilinear')
我正在寻找使用我的 pandas 数据框中的列绘制 Wordcloud
这是我的代码:
all_words=''.join( [tweet for tweet in tweet_table['tokens'] ] )
word_Cloud=WordCloud(width=500, height=300, random_state=21, max_font_size=119).generate(all_words)
plt.imshow(word_Cloud, interpolation='bilinear')
我要绘制的列 tweet_table['tokens']
如下所示:
0 [da, trumpanzee, follower, blm, balance, wp, g...
1 [counting, blacklivesmatter, received, trainin...
2 [okay, like, little, kids, pretty, smart, know...
3 [thank, oscopelabs, got, mounted, loud, amp, p...
4 [bpi, proud, supported, hoops, 4l, f, e, see, ...
...
44713 [tomorrow, buy, charity, compilation, undergro...
44714 [needs, erected, state, capitol, think, darkfa...
44715 [clay, county, sheriffs, motto, screw, amp, in...
44716 [films, eleven, films, bravo, bad, ass, video,...
44717 [everybody, give, listen, blm]
我上面的代码给出了以下错误:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-227-4066d6d1a153> in <module>
2 # REMOVE STOP WORDS
3
----> 4 all_words=''.join( [tweet for tweet in tweet_table['tokens'] ] )
TypeError: sequence item 0: expected str instance, list found
请问如何解决这个错误? tweet_table['token']
列是 tokenized
并清除任何 stopwords
非常感谢
Ps:当我为此专栏使用类似代码时 tweet_table['clean_text']
代码工作正常。
列 tweet_table['clean_text']
如下所示:
0 You have a da trumpanzee follower in ...
1 Over 279 and counting If BlackLivesMatte...
2 Okay but like little kids are pretty smart and...
3 Thank you oscopelabs got it mounted loud amp...
4 BPI is proud to have supported Hoops4L Y F E ...
...
44713 TOMORROW you can buy the charity compilation...
44714 That needs to be erected at the State Capi...
44715 Clay County Sheriffs Motto To Screw amp ...
44716 Films Eleven Films bravo Bad ass vid...
44717 everybody should give this a listen ...
我刚刚修好了
allwords=''.join( str(tweet_table['tokens']))
word_Cloud=WordCloud(width=500, height=300, random_state=21,
max_font_size=119).generate(allwords)
plt.imshow(word_Cloud, interpolation='bilinear')
其中 tweet_table['tokens']
没有任何停用词。否则,我们创建一个停用词列表并将其添加为下面的代码
from wordcloud import WordCloud,STOPWORDS
stopwords_newlist = ["https", "co"] + list(STOPWORDS)
allwords=''.join( str(tweet_table['tokens']))
word_Cloud=WordCloud(width=500, height=300, random_state=21, stopwords=stopwords_newlist,
max_font_size=119).generate(allwords)
plt.imshow(word_Cloud, interpolation='bilinear')