避免 R 中特殊字符的通用方法

Generic way to avoid special characters in R

以下是一系列的邮件主题。 DF-data.frame。请注意,我已经从 excel sheet.

 Buy the stunning new phone
 The game changer is here.
  Experience a phone ahead of its time.
  Thank You Chennai
  Buy a phone at 10000 and get a new sim free
  Buy the stunning new phone
我使用以下代码在 R 中创建了一个术语文档矩阵

 mycorpus<-tm_map(mycorpus, removeNumbers)
 mycorpus<-tm_map(mycorpus, tolower)
 mycorpus<-tm_map(mycorpus, removeWords, stopwords("english"))

    # # Create a term diocumentmatrix
     v <- sort(rowSums(m),decreasing=TRUE)
     d <- data.frame(word = names(v),freq=v)
     head(d, 10)


                          word freq

                          get   45
                          free   44
                          edge   35

                          new   29
                          buy   24
                        charger   23
                        wireless   23
                          just   21
                          month   21
                            per   21
                        starting   21
                        stunning   21
                            pro   20
                            now   17
                         offers   17
                           gear   16
                       exclusive   15
                          offer   14
                           gift   13

                       irresistible   10
                           loved   10
                    valentine’s   10

我正在获取术语文档矩阵。然而,一些单词仅在术语文档矩阵中出现带有特殊字符——它们不存在于原始数据框中。我试过调整编码并手动删除了 Gsub 的编码。有没有办法避免我的 excel sheet 中的单词被处理为特殊字符。

gsub("€™", "", d$word)



Encoding(x) <- "UTF-8"

iconv(dtm, "UTF-8", "ASCII", sub="")