每行作为输入 R 的词云
Wordcloud with each line as input R
我有一个包含一列和 190178 行的文件,其中几行如下所示:
anatomical_structure_development
nucleic_acid_binding_transcription_factor_activity
molecular_function
biological_process
biosynthetic_process
cellular_nitrogen_compound_metabolic_process
embryo_development
anatomical_structure_formation_involved_in_morphogenesis
immune_system_process
biosynthetic_process
cellular_nitrogen_compound_metabolic_process
embryo_development
我想使用 R 中的 tm 和 wordcloud 包制作此数据的词云,将每一行作为输入,并且用每一行出现的频率制作词云。我已经使用来自 "speech" 语料库格式的简单指令进行了尝试,但这样一来,"process" 这个词出现的频率最高,大小也最大,这不是我想要的。我希望频率最高的线最大。
我使用了常见示例中的以下代码,但没有得到我想要的:
library(tm)
library(wordcloud)
GO <- Corpus(DirSource("/home/student-a/Desktop/Untitled Folder/"))
wordcloud(GO)
我该怎么做?
这适用于示例,但适用于 wordcloud2。当单词太长时,wordcloud 会发出警告。虽然wordcloud2绘图速度也不是很快,需要打开查看器才能看到结果
anatomical_structure_formation_involved_in_morphogenesis could not be
fit on page. It will not be plotted.
代码与wordcloud2:
library(wordcloud2)
library(dplyr)
text <- c("anatomical_structure_development",
"nucleic_acid_binding_transcription_factor_activity",
"molecular_function",
"biological_process",
"biosynthetic_process",
"cellular_nitrogen_compound_metabolic_process",
"embryo_development",
"anatomical_structure_formation_involved_in_morphogenesis",
"immune_system_process",
"biosynthetic_process",
"cellular_nitrogen_compound_metabolic_process",
"embryo_development")
# wordcloud2 needs a data.frame with frequencies. This will generate the table from the text.
df <- text %>% data_frame(words = .) %>%
group_by(words) %>%
summarise(freq = n())
wordcloud2(df)
我有一个包含一列和 190178 行的文件,其中几行如下所示:
anatomical_structure_development
nucleic_acid_binding_transcription_factor_activity
molecular_function
biological_process
biosynthetic_process
cellular_nitrogen_compound_metabolic_process
embryo_development
anatomical_structure_formation_involved_in_morphogenesis
immune_system_process
biosynthetic_process
cellular_nitrogen_compound_metabolic_process
embryo_development
我想使用 R 中的 tm 和 wordcloud 包制作此数据的词云,将每一行作为输入,并且用每一行出现的频率制作词云。我已经使用来自 "speech" 语料库格式的简单指令进行了尝试,但这样一来,"process" 这个词出现的频率最高,大小也最大,这不是我想要的。我希望频率最高的线最大。
我使用了常见示例中的以下代码,但没有得到我想要的:
library(tm)
library(wordcloud)
GO <- Corpus(DirSource("/home/student-a/Desktop/Untitled Folder/"))
wordcloud(GO)
我该怎么做?
这适用于示例,但适用于 wordcloud2。当单词太长时,wordcloud 会发出警告。虽然wordcloud2绘图速度也不是很快,需要打开查看器才能看到结果
anatomical_structure_formation_involved_in_morphogenesis could not be fit on page. It will not be plotted.
代码与wordcloud2:
library(wordcloud2)
library(dplyr)
text <- c("anatomical_structure_development",
"nucleic_acid_binding_transcription_factor_activity",
"molecular_function",
"biological_process",
"biosynthetic_process",
"cellular_nitrogen_compound_metabolic_process",
"embryo_development",
"anatomical_structure_formation_involved_in_morphogenesis",
"immune_system_process",
"biosynthetic_process",
"cellular_nitrogen_compound_metabolic_process",
"embryo_development")
# wordcloud2 needs a data.frame with frequencies. This will generate the table from the text.
df <- text %>% data_frame(words = .) %>%
group_by(words) %>%
summarise(freq = n())
wordcloud2(df)