每行作为输入 R 的词云

Question

我有一个包含一列和 190178 行的文件，其中几行如下所示：

anatomical_structure_development
nucleic_acid_binding_transcription_factor_activity
molecular_function
biological_process
biosynthetic_process
cellular_nitrogen_compound_metabolic_process
embryo_development
anatomical_structure_formation_involved_in_morphogenesis
immune_system_process
biosynthetic_process
cellular_nitrogen_compound_metabolic_process
embryo_development

我想使用 R 中的 tm 和 wordcloud 包制作此数据的词云，将每一行作为输入，并且用每一行出现的频率制作词云。我已经使用来自 "speech" 语料库格式的简单指令进行了尝试，但这样一来，"process" 这个词出现的频率最高，大小也最大，这不是我想要的。我希望频率最高的线最大。

我使用了常见示例中的以下代码，但没有得到我想要的：

library(tm)
library(wordcloud)
GO <- Corpus(DirSource("/home/student-a/Desktop/Untitled Folder/"))
wordcloud(GO)

我该怎么做？

Answer 1

这适用于示例，但适用于 wordcloud2。当单词太长时，wordcloud 会发出警告。虽然wordcloud2绘图速度也不是很快，需要打开查看器才能看到结果

anatomical_structure_formation_involved_in_morphogenesis could not be fit on page. It will not be plotted.

代码与wordcloud2:

library(wordcloud2)
library(dplyr)

text <- c("anatomical_structure_development",
          "nucleic_acid_binding_transcription_factor_activity",
          "molecular_function",
          "biological_process",
          "biosynthetic_process",
          "cellular_nitrogen_compound_metabolic_process",
          "embryo_development",
          "anatomical_structure_formation_involved_in_morphogenesis",
          "immune_system_process",
          "biosynthetic_process",
          "cellular_nitrogen_compound_metabolic_process",
          "embryo_development")

# wordcloud2 needs a data.frame with frequencies. This will generate the table from the text.
df <- text %>% data_frame(words = .) %>% 
  group_by(words) %>% 
  summarise(freq = n())

wordcloud2(df)

每行作为输入 R 的词云

Wordcloud with each line as input R

r

word-cloud