有没有办法使用 dapply 跨 SparkR DataFrame 的多个列进行模式匹配和替换?

Is there a way to use dapply to do pattern matching and replacement across multiple columns of a SparkR DataFrame?

运行 本地 Spark 2.0

df <- data.frame(a = c("[=10=].00 ", "1.19 ", "1.19 ", "8.58 "),
             b = c("8.81 ", "6.85", ".37 ", ".37 "),
             c = c("8.58 ", "1.15 ", "2.30 ", "0.30")
             )

ddf <- as.DataFrame(df)

我正在寻找 运行 这样的东西

ddf2 <- dapply(ddf, function(x) { regexp_replace(x, "\$|,", "")}, schema(ddf))

但它 returns 错误

head(ddf2)
ERROR Executor: Exception in task 0.0 in stage 13.0 (TID 13)
org.apache.spark.SparkException: R computation failed with
Error in (function (classes, fdef, mtable)  : 
unable to find an inherited method for function ‘regexp_replace’ for signature ‘"data.frame", "character", "character"’

使用dapply:

ddf2 <- dapply(ddf, function(x) { as.data.frame(apply(x, MARGIN=2, function(y) gsub("\$|,", "", y, perl=TRUE)), stringsAsFactors = FALSE) } , schema(ddf))

dapply 期望 R data.frame 作为匿名函数的输出。

regexp_replace 方法需要一个 SparkDataFrame Column 作为输入。

没有 dapply 的示例(仅替换 a 列的值):

withColumn(ddf,'a', regexp_replace(ddf$a, "\$|,", ""))