Spark hive udf:没有 UDAF 分析异常的处理程序
Spark hive udf: no handler for UDAF analysis exception
创建了一个项目'spark-udf'并编写了如下的hive udf:
package com.spark.udf
import org.apache.hadoop.hive.ql.exec.UDF
class UpperCase extends UDF with Serializable {
def evaluate(input: String): String = {
input.toUpperCase
}
构建它并为它创建了 jar。试图在另一个 spark 程序中使用这个 udf:
spark.sql("CREATE OR REPLACE FUNCTION uppercase AS 'com.spark.udf.UpperCase' USING JAR '/home/swapnil/spark-udf/target/spark-udf-1.0.jar'")
但下面一行给出了例外:
spark.sql("select uppercase(Car) as NAME from cars").show
异常:
Exception in thread "main" org.apache.spark.sql.AnalysisException: No
handler for UDAF 'com.spark.udf.UpperCase'. Use
sparkSession.udf.register(...) instead.; line 1 pos 7 at
org.apache.spark.sql.catalyst.catalog.SessionCatalog.makeFunctionExpression(SessionCatalog.scala:1105)
at
org.apache.spark.sql.catalyst.catalog.SessionCatalog$$anonfun$org$apache$spark$sql$catalyst$catalog$SessionCatalog$$makeFunctionBuilder.apply(SessionCatalog.scala:1085)
at
org.apache.spark.sql.catalyst.catalog.SessionCatalog$$anonfun$org$apache$spark$sql$catalyst$catalog$SessionCatalog$$makeFunctionBuilder.apply(SessionCatalog.scala:1085)
at
org.apache.spark.sql.catalyst.analysis.SimpleFunctionRegistry.lookupFunction(FunctionRegistry.scala:115)
at
org.apache.spark.sql.catalyst.catalog.SessionCatalog.lookupFunction(SessionCatalog.scala:1247)
at
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$$anonfun$applyOrElse$$anonfun$applyOrElse.apply(Analyzer.scala:1226)
at
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$$anonfun$applyOrElse$$anonfun$applyOrElse.apply(Analyzer.scala:1226)
at
org.apache.spark.sql.catalyst.analysis.package$.withPosition(package.scala:48)
非常感谢任何有关此的帮助。
评论中提到,写Spark UDF比较好:
val uppercaseUDF = spark.udf.register("uppercase", (s : String) => s.toUpperCase)
spark.sql("select uppercase(Car) as NAME from cars").show
主要原因是您在创建SparkSession时没有设置enableHiveSupport
。在这种情况下,将使用默认 SessionCatalog 并且 SessionCatalog
中的 makeFunctionExpression
函数仅扫描用户定义的聚合函数。如果函数不是 UDAF,则不会找到它。
已创建 Jira task 来实现此
问题是 class 需要 public。
package com.spark.udf
import org.apache.hadoop.hive.ql.exec.UDF
public class UpperCase extends UDF with Serializable {
def evaluate(input: String): String = {
input.toUpperCase
}
}
创建了一个项目'spark-udf'并编写了如下的hive udf:
package com.spark.udf
import org.apache.hadoop.hive.ql.exec.UDF
class UpperCase extends UDF with Serializable {
def evaluate(input: String): String = {
input.toUpperCase
}
构建它并为它创建了 jar。试图在另一个 spark 程序中使用这个 udf:
spark.sql("CREATE OR REPLACE FUNCTION uppercase AS 'com.spark.udf.UpperCase' USING JAR '/home/swapnil/spark-udf/target/spark-udf-1.0.jar'")
但下面一行给出了例外:
spark.sql("select uppercase(Car) as NAME from cars").show
异常:
Exception in thread "main" org.apache.spark.sql.AnalysisException: No handler for UDAF 'com.spark.udf.UpperCase'. Use sparkSession.udf.register(...) instead.; line 1 pos 7 at org.apache.spark.sql.catalyst.catalog.SessionCatalog.makeFunctionExpression(SessionCatalog.scala:1105) at org.apache.spark.sql.catalyst.catalog.SessionCatalog$$anonfun$org$apache$spark$sql$catalyst$catalog$SessionCatalog$$makeFunctionBuilder.apply(SessionCatalog.scala:1085) at org.apache.spark.sql.catalyst.catalog.SessionCatalog$$anonfun$org$apache$spark$sql$catalyst$catalog$SessionCatalog$$makeFunctionBuilder.apply(SessionCatalog.scala:1085) at org.apache.spark.sql.catalyst.analysis.SimpleFunctionRegistry.lookupFunction(FunctionRegistry.scala:115) at org.apache.spark.sql.catalyst.catalog.SessionCatalog.lookupFunction(SessionCatalog.scala:1247) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$$anonfun$applyOrElse$$anonfun$applyOrElse.apply(Analyzer.scala:1226) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$$anonfun$applyOrElse$$anonfun$applyOrElse.apply(Analyzer.scala:1226) at org.apache.spark.sql.catalyst.analysis.package$.withPosition(package.scala:48)
非常感谢任何有关此的帮助。
评论中提到,写Spark UDF比较好:
val uppercaseUDF = spark.udf.register("uppercase", (s : String) => s.toUpperCase)
spark.sql("select uppercase(Car) as NAME from cars").show
主要原因是您在创建SparkSession时没有设置enableHiveSupport
。在这种情况下,将使用默认 SessionCatalog 并且 SessionCatalog
中的 makeFunctionExpression
函数仅扫描用户定义的聚合函数。如果函数不是 UDAF,则不会找到它。
已创建 Jira task 来实现此
问题是 class 需要 public。
package com.spark.udf
import org.apache.hadoop.hive.ql.exec.UDF
public class UpperCase extends UDF with Serializable {
def evaluate(input: String): String = {
input.toUpperCase
}
}