RserveException: eval failed 语法错误

RserveException: eval failed Syntax error

我有一个 R 函数可以从 html 页面中删除所有 html 数据。 当我在 R 中 运行 它时它起作用 但是当我通过 Rserve 运行 它产生错误时:

Exception in thread "main" org.rosuda.REngine.Rserve.RserveException: eval failed, request status: R parser: syntax error

at org.rosuda.REngine.Rserve.RConnection.eval(RConnection.java:234)
at CereScope_Data.main(CereScope_Data.java:80)

Java Eval 我在哪里得到错误:

REXP lstrRemoveHtml = cobjConn.eval("RemoveHtml('" + lstrRawData + "')");

我的 R 函数: rawdata 是一个 HTML 页面

RemoveHtml <- function(rawdata) {
  
  library("tm")
  
  ## Convering Data To UTF-8 Format
  ## Creating Corpus
  Encoding(rawdata) <- "latin1"
  docs <- Corpus(VectorSource(iconv(rawdata, from = "latin1", to = "UTF-8", sub = "")))
  
  toSpace <- content_transformer(function(x , pattern) gsub(pattern, " ", x))
  
  docs <- gsub("[^\b]*(<style).*?(</style>)", " ", docs)
  docs <- Corpus(VectorSource(gsub("[^\b]*(<script).*?(</script>)", " ", docs)))
  docs <- tm_map(docs, toSpace, "<.*?>")
  docs <- tm_map(docs, toSpace, "(//).*?[^\n]*")
  docs <- tm_map(docs, toSpace, "/")
  docs <- tm_map(docs, toSpace, "\\t")
  docs <- tm_map(docs, toSpace, "\\n")
  docs <- tm_map(docs, toSpace, "\\")
  docs <- tm_map(docs, toSpace, "@")
  docs <- tm_map(docs, toSpace, "\|")
  
  docs <- tm_map(docs, toSpace, "\\"")
  docs <- tm_map(docs, toSpace, ",")
  RemoveHtmlDocs <- tm_map(docs, stripWhitespace)
  
  return(as.character(RemoveHtmlDocs)[1])
}

Update - Things I tried already

  1. Escaping characters which may cause problems such as Single and Double Quotes and Backslashes
  2. I also tried assigning whole data to an R variable through eval and then running the function

新更新 - 问题已解决

  1. Escaping characters were causing problems such as Single and Double Quotes and Backslashes
  2. Another line which was no longer necessary was causing the problem as I didn't comment or remove it.

谢谢大家!! :) 检查我的答案以获取描述!! :)

错误在

REXP lstrRemoveHtml = cobjConn.eval("RemoveHtml('" + lstrRawData + "')");

In Java, \ is an escape character. So it escapes the meaning of " which is meant to act as r expression

解决方案:只需在传递给 eval 函数之前附加 lstrRawData 作为

exp = "RemoveHtml(\"" + lstrRawData + "\")";
REXP lstrRemoveHtml = cobjConn.eval(exp)

转义字符是问题所在。为了解决这个问题,我转义了转义和引号。 我创建了此方法以使其更简单:

public static String Regexer(String Data) {
    String RegexedData = Data.replaceAll("\\", "\\\\").replaceAll("'", "\\'").replaceAll("\"", "\\\"");
    return (RegexedData);
}

我在上面的函数中再次对转义字符进行了转义,以便它们在 R 函数中也被转义。

提示:不要忘记将 REXP 转换为 Java 变量。 :)