SparkR 总结在另一个函数中调用的函数

Question

说在SparkR中我想统计一个DataFrame中不同元素出现的次数，所以写了一个函数：

count_spark <- function(df, col) {
  newCol <- paste0('N_', col)
  df %>%
    group_by(.[[col]]) %>%
    summarize(newCol = count(df[[col]]))
}
count_spark(df, 'EventType')

这不是我所期望的，因为 newCol 是按字面解释的，所以没有创建一个名为 N_EventType 的新列，而是创建了一个名为 newCol 的新列。

我该如何解决这个问题？

Answer 1

像这样使用alias:

count_spark <- function(df, col) {
  newCol <- paste0('N_', col)
  df %>%
    group_by(.[[col]]) %>%
    summarize(alias(count(df[[col]]), newCol))
}

SparkR 总结在另一个函数中调用的函数

SparkR summarize function called within another function

r

apache-spark

sparkr