部分匹配 (pmatch) 不适用于 shinyapps.io

Question

我一直在开发一个网络抓取应用程序，用于从 JSTOR 收集一些信息。该应用程序在本地运行良好，但在 shinyapp.io.

部署时无法运行

这个想法很简单，应用程序下载 html 页（例如：https://www.jstor.org/action/doBasicSearch?Query=example&acc=off&wc=on&fc=off&group=none）并阅读旁边的列表，其中可以找到有关每个学科的点击次数的信息。

webpage <- read_html(filePath)
hits_html <- html_nodes(webpage, 'li')
hits <- html_text(hits_html)

为了制作数据框，应用程序使用学科列表 select 通过部分匹配来自网页的文本信息。这会在网页的学科列表中生成 selected 学科的索引，如下所示：

disciplines <- list("\r\n                African American Studies",
                    "\r\n                African Studies",
                    "\r\n                Agriculture",
                    "\r\n                American Studies",
                    "\r\n                Anthropology",
                    "etc...")

index <- pmatch(disciplines[[i]], hits)

string <- hits[index]

具有 selected 规程的字符串按以下方式转换为数字：

begin<-regexpr("\(", string)
end<-regexpr("\)", string)
        
k<-substring(string, begin+1, end-1)
k<-sub(",", "", k)
k<-as.numeric(k)

这在本地工作正常，但在 shinyapps.io 上不起作用。经过多次测试，我注意到问题出在函数 pmatch（或我尝试过的任何匹配）上。匹配函数 return NA 在 shinyapps.io 中使用时，它们在本地工作得很好。我已经尝试了以下一些替代方案：

index <- pmatch(disciplines[[t]], as.list(hits)) # DOES NOT WORK ON SHINYAPPS.IO
index <- pmatch(disciplines[[t]], hits) # DOES NOT WORK ON SHINYAPPS.IO
index <- which(stringr::str_detect(hits, disciplines[[t]]))[[1]] # DOES NOT WORK ON SHINYAPPS.IO
index <- sjmisc::str_find(hits, disciplines[[t]])[[1]] # DOES NOT WORK ON SHINYAPPS.IO

有人遇到过类似的问题吗？

Answer 1

亚瑟！这似乎与您按预期结果使用的行分隔有关。这个带有“\r\n”的结果将只匹配运行在 Windows 环境中得到的 HTML。如果您的服务器是基于 Unix 的服务器，它将不匹配，因为那里的行分隔符是“\n”。

尝试从预期结果中删除 \r 并重新运行您的申请。

部分匹配 (pmatch) 不适用于 shinyapps.io

Partial match (pmatch) not working on shinyapps.io

match

shiny

shinyapps