如何在 ShinyApp 中上传文本文件文件夹以从 R 中的文件语料库中获取文档术语矩阵?

How to upload folder of text files in ShinyApp to get Document Term Matrix from Corpus of files in R?

我想从 Shiny App 中的系统上传文本文件文件夹,以便从语料库中获取其 Document Term Matrix 以应用 K-means
我尝试了各种方法来做到这一点,但我无法在所有上传的文件之间建立连接来创建语料库。
我可以通过在全球环境中创建语料库来应用 K-means,但我想通过 ShinyApp 上传 Folder 或选择 Multiple Files 来完成此操作。

以下是我目前所做的代码:

library(shiny)
library(shinydashboard)
library(shinythemes)
library(shinyFiles)
library(tm)

ui <- dashboardPage(
  dashboardHeader(title = "Document_Clustering"),
  dashboardSidebar( 
    sidebarMenu(
        menuItem("Data Processing", tabName = "DP", icon = icon("info-circle")),
        menuItem("K-Means", tabName = "KMeans", icon = icon("th"))
)),
  dashboardBody(
    tabItems(
      tabItem(tabName = "DP",
         fluidRow(
          box(fileInput('file1', 'Choose Files',
                       accept=c('text/csv',
                               'text/comma-separated-values,text/plain',
                              '.csv'), multiple = TRUE)
          ,  solidHeader = TRUE))
   ,fluidRow(
    box(title = "Pre-processing",  width = 15 ,tableOutput('proc'))
  )

  ),


  tabItem(tabName = "KMeans",
          fluidRow(
            box(
              title = "Enter Number of Clusters:",
              selectInput("C", choices =c(seq(1 , 15, 1)),label = NULL ,selected = 1), solidHeader = TRUE
            )),
          fluidRow(box(title = "Cluster", width = 9, textOutput("cluster1"))),
          fluidRow(box(title = "Cluster Size", width = 9, textOutput("size1"))),
          fluidRow(box(title= "Between Cluster Hetrogeneity" , width=9, textOutput("hetro1")))

  )
)))

server <- shinyServer(function(input, output, session){
  myData <- reactive({
    inFile <- input$file1
    if (is.null(inFile)) return(NULL)

con<- file(inFile$datapath, open="rt", encoding = "UTF-8")
text<-readLines(con)
msg<- paste(text, collapse = "\n")
close(con)
msg<- msg


myCorpus <- Corpus(VectorSource(msg))
myCorpus <- tm_map(myCorpus, tolower)
myCorpus <- tm_map(myCorpus, PlainTextDocument)
myCorpus<- tm_map(myCorpus,removePunctuation)
myCorpus <- tm_map(myCorpus, removeNumbers)
myCorpus <- tm_map(myCorpus, removeWords,stopwords("english"))
myCorpus <- tm_map(myCorpus, stripWhitespace)
dtm <- DocumentTermMatrix(myCorpus,control = list(minWordLength = 1))
dtm_tfxidf <- weightTfIdf(dtm)
m11 <- as.matrix(dtm_tfxidf)
ri <- m11


set.seed(1234)
### Only kmeans
n2 <- input$C
clusk <- kmeans(as.data.frame(ri), n2) #, nstart = 9)

T3<- list(Name= m11, Cluster_K=clusk$cluster, Size_K= clusk$size, Hetro_K=clusk$betweenss/clusk$totss*100)
  })

  output$proc <- renderTable({
    myData()$Name
  })

  output$cluster1 <- renderText({
    myData()$Cluster_K

  })

  output$size1 <- renderText({
    myData()$Size_K

  })

  output$hetro1 <- renderText({
    myData()$Hetro_K
  })

  })

shinyApp(ui= ui, server = server)  

使用上面的代码,我可以上传多个文件,但在进一步处理过程中出现错误。 错误:无效的 'description' 参数 我无法解决。
此外,当我只上传单个文件时,一切似乎都正常,但 我没有得到为什么单个文件的簇大小在 kmeans 中为 2 的原因。

非常感谢任何形式的帮助。
提前致谢!

我们无法在不使用某些函数的情况下连接所有文件,而我的代码中缺少该函数。

要使 ShinyApp 正常工作,请在 服务器部分 中进行以下更改:

替换这个

con<- file(inFile$datapath, open="rt", encoding = "UTF-8")
text<-readLines(con)
msg<- paste(text, collapse = "\n")
close(con)
msg<- msg

myCorpus <- Corpus(VectorSource(msg))

有了这个

    get.msg <- function(path)
{
  con <- file(path, open = "rt", encoding = "latin1")
  text <- readLines(con)
  msg <- text[seq(which(text == "")[1] + 1, length(text), 1)]
  close(con)
  return(paste(msg, collapse = "\n"))
}

data.docs <- inFile$datapath
data.docs <- data.docs[which(data.docs != "cmds")]
all.data <- sapply(data.docs,
                   function(p) get.msg(file.path(p)))

myCorpus <- Corpus(VectorSource(all.data))