使用 Scala 将数据框列名作为参数传递给函数?

Pass dataframe column name as parameter to the function using scala?


I have a function that adds 2 columns:

def sum_num (num1: Int, num2: Int): Int = {
    return num1 + num2
}

我有一个数据帧 df 具有以下值

+----+----+----+
|col1|col2|col3|
+----+----+----+
|1   |2   |5   |
|7   |4   |4   |
+----+----+----+

我想添加一列并将列名传递给函数,但下面的代码不起作用。它给出错误发现 Column required is Int

val newdf = df.withColumn("sum_of_cols1", sum_num($col1, $ col2))
              .withColumn("sum_of_cols2", sum_num($col1, $ col3))

将您的代码更改为:

import spark.implicits._

def sum_num (num1: Column, num2: Column): Column = {
  return num1 + num2
}

val newdf = df.withColumn("sum_of_cols1", sum_num($"col1", $"col2"))
  .withColumn("sum_of_cols2", sum_num($"col1", $"col3"))

您必须对 Spark SQL 列进行操作。您可以对它们进行算术运算。看看可以用的operators