带有字符串列表的列中的文本的 Wordcloud
Wordcloud with text from a column with list of strings
我的数据集有 10 列,其中一列包含作为字符串列表的文本。
数据集:
Col1 Col2 Col3 Text
... ... ... ['I','have', 'a','dream']
... ... ... ['My', 'mom', 'is','Spanish']
代码
wordcloud = WordCloud(stopwords=stopwords, max_font_size=50, max_words=100, background_color="white").generate(' '.join(df['Text']))
plt.figure()
plt.imshow(wordcloud, interpolation="bilinear")
plt.axis("off")
plt.show()
returns
TypeError: sequence item 0: expected str instance, list found
很明显它需要字符串,而不是列表。如何将文本列中的列表转换为字符串?
您可以尝试先将列 df['Text']
中的列表与 .sum()
连接起来,然后加入:
combined_text = ' '.join(df['Text'].sum())
wordcloud = (
WordCloud(stopwords=stopwords,
max_font_size=50,
max_words=100,
background_color="white")
.generate(combined_text)
)
由于您在数据集中有列表作为值,请先尝试分解它们:
wordcloud = (WordCloud(stopwords=stopwords,
max_font_size=50,
max_words=100,
background_color="white")
.generate(' '.join(df['Text'].explode())))
或者先加入他们:
wordcloud = (WordCloud(stopwords=stopwords,
max_font_size=50,
max_words=100,
background_color="white")
.generate(' '.join(df['Text'].agg(' '.join)))
我的数据集有 10 列,其中一列包含作为字符串列表的文本。
数据集:
Col1 Col2 Col3 Text
... ... ... ['I','have', 'a','dream']
... ... ... ['My', 'mom', 'is','Spanish']
代码
wordcloud = WordCloud(stopwords=stopwords, max_font_size=50, max_words=100, background_color="white").generate(' '.join(df['Text']))
plt.figure()
plt.imshow(wordcloud, interpolation="bilinear")
plt.axis("off")
plt.show()
returns
TypeError: sequence item 0: expected str instance, list found
很明显它需要字符串,而不是列表。如何将文本列中的列表转换为字符串?
您可以尝试先将列 df['Text']
中的列表与 .sum()
连接起来,然后加入:
combined_text = ' '.join(df['Text'].sum())
wordcloud = (
WordCloud(stopwords=stopwords,
max_font_size=50,
max_words=100,
background_color="white")
.generate(combined_text)
)
由于您在数据集中有列表作为值,请先尝试分解它们:
wordcloud = (WordCloud(stopwords=stopwords,
max_font_size=50,
max_words=100,
background_color="white")
.generate(' '.join(df['Text'].explode())))
或者先加入他们:
wordcloud = (WordCloud(stopwords=stopwords,
max_font_size=50,
max_words=100,
background_color="white")
.generate(' '.join(df['Text'].agg(' '.join)))