每行作为输入 R 的词云

Wordcloud with each line as input R

我有一个包含一列和 190178 行的文件,其中几行如下所示:

anatomical_structure_development
nucleic_acid_binding_transcription_factor_activity
molecular_function
biological_process
biosynthetic_process
cellular_nitrogen_compound_metabolic_process
embryo_development
anatomical_structure_formation_involved_in_morphogenesis
immune_system_process
biosynthetic_process
cellular_nitrogen_compound_metabolic_process
embryo_development

我想使用 R 中的 tmwordcloud 包制作此数据的词云,将每一行作为输入,并且用每一行出现的频率制作词云。我已经使用来自 "speech" 语料库格式的简单指令进行了尝试,但这样一来,"process" 这个词出现的频率最高,大小也最大,这不是我想要的。我希望频率最高的线最大。

我使用了常见示例中的以下代码,但没有得到我想要的:

library(tm)
library(wordcloud)
GO <- Corpus(DirSource("/home/student-a/Desktop/Untitled Folder/"))
wordcloud(GO)

我该怎么做?

这适用于示例,但适用于 wordcloud2。当单词太长时,wordcloud 会发出警告。虽然wordcloud2绘图速度也不是很快,需要打开查看器才能看到结果

anatomical_structure_formation_involved_in_morphogenesis could not be fit on page. It will not be plotted.

代码与wordcloud2:

library(wordcloud2)
library(dplyr)

text <- c("anatomical_structure_development",
          "nucleic_acid_binding_transcription_factor_activity",
          "molecular_function",
          "biological_process",
          "biosynthetic_process",
          "cellular_nitrogen_compound_metabolic_process",
          "embryo_development",
          "anatomical_structure_formation_involved_in_morphogenesis",
          "immune_system_process",
          "biosynthetic_process",
          "cellular_nitrogen_compound_metabolic_process",
          "embryo_development")

# wordcloud2 needs a data.frame with frequencies. This will generate the table from the text.
df <- text %>% data_frame(words = .) %>% 
  group_by(words) %>% 
  summarise(freq = n())

wordcloud2(df)