有没有办法使用 dapply 跨 SparkR DataFrame 的多个列进行模式匹配和替换?
Is there a way to use dapply to do pattern matching and replacement across multiple columns of a SparkR DataFrame?
运行 本地 Spark 2.0
df <- data.frame(a = c("[=10=].00 ", "1.19 ", "1.19 ", "8.58 "),
b = c("8.81 ", "6.85", ".37 ", ".37 "),
c = c("8.58 ", "1.15 ", "2.30 ", "0.30")
)
ddf <- as.DataFrame(df)
我正在寻找 运行 这样的东西
ddf2 <- dapply(ddf, function(x) { regexp_replace(x, "\$|,", "")}, schema(ddf))
但它 returns 错误
head(ddf2)
ERROR Executor: Exception in task 0.0 in stage 13.0 (TID 13)
org.apache.spark.SparkException: R computation failed with
Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function ‘regexp_replace’ for signature ‘"data.frame", "character", "character"’
使用dapply
:
ddf2 <- dapply(ddf, function(x) { as.data.frame(apply(x, MARGIN=2, function(y) gsub("\$|,", "", y, perl=TRUE)), stringsAsFactors = FALSE) } , schema(ddf))
dapply
期望 R data.frame 作为匿名函数的输出。
regexp_replace
方法需要一个 SparkDataFrame Column
作为输入。
没有 dapply
的示例(仅替换 a
列的值):
withColumn(ddf,'a', regexp_replace(ddf$a, "\$|,", ""))
运行 本地 Spark 2.0
df <- data.frame(a = c("[=10=].00 ", "1.19 ", "1.19 ", "8.58 "),
b = c("8.81 ", "6.85", ".37 ", ".37 "),
c = c("8.58 ", "1.15 ", "2.30 ", "0.30")
)
ddf <- as.DataFrame(df)
我正在寻找 运行 这样的东西
ddf2 <- dapply(ddf, function(x) { regexp_replace(x, "\$|,", "")}, schema(ddf))
但它 returns 错误
head(ddf2)
ERROR Executor: Exception in task 0.0 in stage 13.0 (TID 13)
org.apache.spark.SparkException: R computation failed with
Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function ‘regexp_replace’ for signature ‘"data.frame", "character", "character"’
使用dapply
:
ddf2 <- dapply(ddf, function(x) { as.data.frame(apply(x, MARGIN=2, function(y) gsub("\$|,", "", y, perl=TRUE)), stringsAsFactors = FALSE) } , schema(ddf))
dapply
期望 R data.frame 作为匿名函数的输出。
regexp_replace
方法需要一个 SparkDataFrame Column
作为输入。
没有 dapply
的示例(仅替换 a
列的值):
withColumn(ddf,'a', regexp_replace(ddf$a, "\$|,", ""))