使用 Java 方法的 UDF 在 spark 上中断
UDF using Java methods breaks on spark
我已经在 databricks 环境中完成了这段代码,但是当我在我的本地环境中尝试它时,它崩溃了...
val _event_day_of_week = (event_date_of_event: String) => {
import java.time.LocalDate
import java.time.format.DateTimeFormatter
val formatter: DateTimeFormatter = DateTimeFormatter.ofPattern("yyyy-MM-dd")
val dayOfWeek: String = LocalDate.parse(event_date_of_event.substring(0,10), formatter).getDayOfWeek.toString
dayOfWeek
}
val event_day_of_weekUDF = udf(_event_day_of_week)
df.select($"uuid", event_day_of_weekUDF($"event_date_of_event") as "event_day_of_week").first
错误:
Exception in thread "main" java.lang.NullPointerException
at com.faniak.ml.eventBuzz$.delayedEndpoint$com$faniak$ml$eventBuzz(eventBuzz.scala:72)
at com.faniak.ml.eventBuzz$delayedInit$body.apply(eventBuzz.scala:17)
at scala.Function0$class.apply$mcV$sp(Function0.scala:34)
at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
at scala.App$$anonfun$main.apply(App.scala:76)
at scala.App$$anonfun$main.apply(App.scala:76)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35)
at scala.App$class.main(App.scala:76)
at com.faniak.ml.eventBuzz$.main(eventBuzz.scala:17)
at com.faniak.ml.eventBuzz.main(eventBuzz.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:147)
版本为 Spark 2.1
问题与 UDF 无关。在 Apache Spark 上制作原型时,请勿扩展 Scala class 应用程序,因为它无法与 Spark 一起正常工作。
object EventBuzzDataset extends App {
为了让它工作,你应该写:
object EventBuzzDataset{
def main(args: Array[String])
问题在这里很详细:
https://issues.apache.org/jira/browse/SPARK-4170
和
https://github.com/apache/spark/pull/3497
感谢@puhlen 的提示!
我已经在 databricks 环境中完成了这段代码,但是当我在我的本地环境中尝试它时,它崩溃了...
val _event_day_of_week = (event_date_of_event: String) => {
import java.time.LocalDate
import java.time.format.DateTimeFormatter
val formatter: DateTimeFormatter = DateTimeFormatter.ofPattern("yyyy-MM-dd")
val dayOfWeek: String = LocalDate.parse(event_date_of_event.substring(0,10), formatter).getDayOfWeek.toString
dayOfWeek
}
val event_day_of_weekUDF = udf(_event_day_of_week)
df.select($"uuid", event_day_of_weekUDF($"event_date_of_event") as "event_day_of_week").first
错误:
Exception in thread "main" java.lang.NullPointerException
at com.faniak.ml.eventBuzz$.delayedEndpoint$com$faniak$ml$eventBuzz(eventBuzz.scala:72)
at com.faniak.ml.eventBuzz$delayedInit$body.apply(eventBuzz.scala:17)
at scala.Function0$class.apply$mcV$sp(Function0.scala:34)
at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
at scala.App$$anonfun$main.apply(App.scala:76)
at scala.App$$anonfun$main.apply(App.scala:76)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35)
at scala.App$class.main(App.scala:76)
at com.faniak.ml.eventBuzz$.main(eventBuzz.scala:17)
at com.faniak.ml.eventBuzz.main(eventBuzz.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:147)
版本为 Spark 2.1
问题与 UDF 无关。在 Apache Spark 上制作原型时,请勿扩展 Scala class 应用程序,因为它无法与 Spark 一起正常工作。
object EventBuzzDataset extends App {
为了让它工作,你应该写:
object EventBuzzDataset{
def main(args: Array[String])
问题在这里很详细: https://issues.apache.org/jira/browse/SPARK-4170 和 https://github.com/apache/spark/pull/3497
感谢@puhlen 的提示!