带有字符串列表的列中的文本的 Wordcloud

Wordcloud with text from a column with list of strings

我的数据集有 10 列,其中一列包含作为字符串列表的文本。

数据集:

Col1 Col2 Col3 Text
...   ...  ... ['I','have', 'a','dream']
...   ...  ... ['My', 'mom', 'is','Spanish']

代码

wordcloud = WordCloud(stopwords=stopwords, max_font_size=50, max_words=100, background_color="white").generate(' '.join(df['Text']))
plt.figure()
plt.imshow(wordcloud, interpolation="bilinear")
plt.axis("off")
plt.show()

returns

TypeError: sequence item 0: expected str instance, list found

很明显它需要字符串,而不是列表。如何将文本列中的列表转换为字符串?

您可以尝试先将列 df['Text'] 中的列表与 .sum() 连接起来,然后加入:

combined_text = ' '.join(df['Text'].sum())

wordcloud = (
    WordCloud(stopwords=stopwords, 
              max_font_size=50, 
              max_words=100,       
              background_color="white")
    .generate(combined_text)
)

由于您在数据集中有列表作为值,请先尝试分解它们:

wordcloud = (WordCloud(stopwords=stopwords, 
                       max_font_size=50, 
                       max_words=100, 
                       background_color="white")
                       .generate(' '.join(df['Text'].explode())))

或者先加入他们:

wordcloud = (WordCloud(stopwords=stopwords, 
                       max_font_size=50, 
                       max_words=100, 
                       background_color="white")
                       .generate(' '.join(df['Text'].agg(' '.join)))