是什么让这个 Corpora 中的文本变成小写,我怎样才能把它变成大写?
What's making the texts lowercase in this Corpora, and how can I turn it uppercase?
我正在尝试在 R
中构建词云,但它只返回小写文本。
sheet <- read_excel('list_products.xls', skip = 4)
products <- c(sheet$Cod)
products <- Corpus(VectorSource(products))
c_words <- brewer.pal(8, 'Set2')
wordcloud(products, min.freq = 10, max.words = 30, scale = c(7,1), colors = c_words)
一旦我尝试将以下代码放在 wordcloud 函数之前,但它不起作用:
products <- tm_map(products, content_transformer(toupper))
什么使文本变成小写,我应该怎么做才能将其变成大写?
好吧,正如您从此处看到的:,当您执行 TermDocumentMatrix(CORPUS)
时,默认情况下单词会变为小写。
事实上,如果你在没有参数 freq
的情况下执行 trace(wordcloud)
,则会执行 tdm <- tm::TermDocumentMatrix(corpus)
,所以你的话会变成小写。
你有两种选择来解决这个问题:
包括单词和频率而不是语料库:
filePath <- "http://www.sthda.com/sthda/RDoc/example-files/martin-luther-king-i-have-a-dream-speech.txt" # I am using this text because you DID NOT PROVIDED A REPRODUCIBLE EXAMPLE
text <- readLines(filePath)
products <- Corpus(VectorSource(text))
products <- tm_map(products, toupper)
c_words <- brewer.pal(8, 'Set2')
tdm <- tm::TermDocumentMatrix(products, control = list(tolower = F))
freq_corpus <- slam::row_sums(tdm)
wordcloud(names(freq_corpus), freq_corpus, min.freq = 10, max.words = 30, scale = c(7,1), colors = c_words)
你会得到:
第二个方案是修改wordcloud:
首先你做 trace(worcloud, edit=T)
然后用以下代码替换第 21 行:
tdm <- tm::TermDocumentMatrix(corpus, control = list(tolower = F))
点击保存并执行:
filePath <- "http://www.sthda.com/sthda/RDoc/example-files/martin-luther-king-i-have-a-dream-speech.txt"
text <- readLines(filePath)
products <- Corpus(VectorSource(text))
products <- tm_map(products, toupper)
c_words <- brewer.pal(8, 'Set2')
wordcloud(names(freq_corpus), freq_corpus, min.freq = 10, max.words = 30, scale = c(7,1), colors = c_words)
你会得到类似的东西:
我正在尝试在 R
中构建词云,但它只返回小写文本。
sheet <- read_excel('list_products.xls', skip = 4)
products <- c(sheet$Cod)
products <- Corpus(VectorSource(products))
c_words <- brewer.pal(8, 'Set2')
wordcloud(products, min.freq = 10, max.words = 30, scale = c(7,1), colors = c_words)
一旦我尝试将以下代码放在 wordcloud 函数之前,但它不起作用:
products <- tm_map(products, content_transformer(toupper))
什么使文本变成小写,我应该怎么做才能将其变成大写?
好吧,正如您从此处看到的:TermDocumentMatrix(CORPUS)
时,默认情况下单词会变为小写。
事实上,如果你在没有参数 freq
的情况下执行 trace(wordcloud)
,则会执行 tdm <- tm::TermDocumentMatrix(corpus)
,所以你的话会变成小写。
你有两种选择来解决这个问题: 包括单词和频率而不是语料库:
filePath <- "http://www.sthda.com/sthda/RDoc/example-files/martin-luther-king-i-have-a-dream-speech.txt" # I am using this text because you DID NOT PROVIDED A REPRODUCIBLE EXAMPLE
text <- readLines(filePath)
products <- Corpus(VectorSource(text))
products <- tm_map(products, toupper)
c_words <- brewer.pal(8, 'Set2')
tdm <- tm::TermDocumentMatrix(products, control = list(tolower = F))
freq_corpus <- slam::row_sums(tdm)
wordcloud(names(freq_corpus), freq_corpus, min.freq = 10, max.words = 30, scale = c(7,1), colors = c_words)
你会得到:
第二个方案是修改wordcloud:
首先你做 trace(worcloud, edit=T)
然后用以下代码替换第 21 行:
tdm <- tm::TermDocumentMatrix(corpus, control = list(tolower = F))
点击保存并执行:
filePath <- "http://www.sthda.com/sthda/RDoc/example-files/martin-luther-king-i-have-a-dream-speech.txt"
text <- readLines(filePath)
products <- Corpus(VectorSource(text))
products <- tm_map(products, toupper)
c_words <- brewer.pal(8, 'Set2')
wordcloud(names(freq_corpus), freq_corpus, min.freq = 10, max.words = 30, scale = c(7,1), colors = c_words)
你会得到类似的东西: