错误执行者:阶段 6.0 spark scala 中的任务 0.0 出现异常?

ERROR Executor: Exception in task 0.0 in stage 6.0 spark scala?

我有一个如下所示的 json 文件。

{"name":"method2","name1":"test","parameter1":"C:/Users/test/Desktop/Online.csv","parameter2": 1.0}

我正在加载我的 json 文件。

val sqlContext = new org.apache.spark.sql.SQLContext(sc)
val df = sqlContext.read.json("C:/Users/test/Desktop/data.json")
val df1=df.select($"name",$"parameter1",$"parameter2").toDF()
df1.show()

我有如下 3 个功能:

def method1(P1:String, P2:Double) {
val data = spark.read.option("header", true).csv(P1).toDF()
val rs= data.select("CID", "Sc").dropDuplicates("CID", "Sc").withColumn("Rat", lit(P2))
val outPutPath="C:/Users/test/Desktop/output"
rs.coalesce(1).write.format("com.databricks.spark.csv").option("header", "true").save(outPutPath)
}
def method2(P1:String, P2:Double){
val data = spark.read.option("header", true).csv(P1).toDF()
val rs= data.select("CID", "Sc").withColumn("r", lit(P2))
val rs1= rs.filter($"CID" =!= "").groupBy("CID","Sc").agg(sum(rs("r")).alias("R"))
val outPutPath="C:/Users/test/Desktop/output"
rs1.coalesce(1).write.format("com.databricks.spark.csv").option("header", "true").save(outPutPath)
}
def methodn(P1:String, P2:Double) {
println("method 2 printhing")
println(P2)
}

我正在尝试使用以下代码调用上述函数

df1.map( row => (row.getString(0), row.getString(1), row.getDouble(2) ) ).foreach { x =>
      x._1.trim.toLowerCase match {
          case "method1" => method1(x._2, x._3) 
          case "method2" => method2(x._2, x._3)
          case _ => methodn(x._2, x._3)
      }
   } 

根据我的 json 对象,它应该调用 method2 但是当我尝试执行上面的代码时,我遇到了以下错误。

17/11/22 16:15:44 ERROR Executor: Exception in task 0.0 in stage 6.0 (TID 6)
java.lang.NullPointerException
        at $line36.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.method2(<console>:24)
        at $line38.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$anonfun.apply(<console>:40)
        at $line38.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$anonfun.apply(<console>:37)
        at scala.collection.Iterator$class.foreach(Iterator.scala:893)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
        at org.apache.spark.rdd.RDD$$anonfun$foreach$$anonfun$apply.apply(RDD.scala:918)
        at org.apache.spark.rdd.RDD$$anonfun$foreach$$anonfun$apply.apply(RDD.scala:918)
        at org.apache.spark.SparkContext$$anonfun$runJob.apply(SparkContext.scala:2062)
        at org.apache.spark.SparkContext$$anonfun$runJob.apply(SparkContext.scala:2062)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
        at org.apache.spark.scheduler.Task.run(Task.scala:108)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
17/11/22 16:15:44 WARN TaskSetManager: Lost task 0.0 in stage 6.0 (TID 6, localhost, executor driver): java.lang.NullPointerException
        at $line36.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.method2(<console>:24)
        at $line38.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$anonfun.apply(<console>:40)
        at $line38.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$anonfun.apply(<console>:37)
        at scala.collection.Iterator$class.foreach(Iterator.scala:893)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
        at org.apache.spark.rdd.RDD$$anonfun$foreach$$anonfun$apply.apply(RDD.scala:918)
        at org.apache.spark.rdd.RDD$$anonfun$foreach$$anonfun$apply.apply(RDD.scala:918)
        at org.apache.spark.SparkContext$$anonfun$runJob.apply(SparkContext.scala:2062)
        at org.apache.spark.SparkContext$$anonfun$runJob.apply(SparkContext.scala:2062)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
        at org.apache.spark.scheduler.Task.run(Task.scala:108)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

17/11/22 16:15:44 ERROR TaskSetManager: Task 0 in stage 6.0 failed 1 times; aborting job
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 6.0 failed 1 times, most recent failure: Lost task 0.0 in stage 6.0 (TID 6, localhost, executor driver): java.lang.NullPointerException
        at method2(<console>:24)
        at $anonfun.apply(<console>:40)
        at $anonfun.apply(<console>:37)
        at scala.collection.Iterator$class.foreach(Iterator.scala:893)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
        at org.apache.spark.rdd.RDD$$anonfun$foreach$$anonfun$apply.apply(RDD.scala:918)
        at org.apache.spark.rdd.RDD$$anonfun$foreach$$anonfun$apply.apply(RDD.scala:918)
        at org.apache.spark.SparkContext$$anonfun$runJob.apply(SparkContext.scala:2062)
        at org.apache.spark.SparkContext$$anonfun$runJob.apply(SparkContext.scala:2062)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
        at org.apache.spark.scheduler.Task.run(Task.scala:108)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

Driver stacktrace:
  at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1499)
  at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage.apply(DAGScheduler.scala:1487)
  at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage.apply(DAGScheduler.scala:1486)
  at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
  at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
  at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1486)
  at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed.apply(DAGScheduler.scala:814)
  at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed.apply(DAGScheduler.scala:814)
  at scala.Option.foreach(Option.scala:257)
  at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:814)
  at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1714)
  at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1669)
  at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1658)
  at org.apache.spark.util.EventLoop$$anon.run(EventLoop.scala:48)
  at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:630)
  at org.apache.spark.SparkContext.runJob(SparkContext.scala:2022)
  at org.apache.spark.SparkContext.runJob(SparkContext.scala:2043)
  at org.apache.spark.SparkContext.runJob(SparkContext.scala:2062)
  at org.apache.spark.SparkContext.runJob(SparkContext.scala:2087)
  at org.apache.spark.rdd.RDD$$anonfun$foreach.apply(RDD.scala:918)
  at org.apache.spark.rdd.RDD$$anonfun$foreach.apply(RDD.scala:916)
  at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
  at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
  at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
  at org.apache.spark.rdd.RDD.foreach(RDD.scala:916)
  at org.apache.spark.sql.Dataset$$anonfun$foreach.apply$mcV$sp(Dataset.scala:2325)
  at org.apache.spark.sql.Dataset$$anonfun$foreach.apply(Dataset.scala:2325)
  at org.apache.spark.sql.Dataset$$anonfun$foreach.apply(Dataset.scala:2325)
  at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:65)
  at org.apache.spark.sql.Dataset.withNewExecutionId(Dataset.scala:2823)
  at org.apache.spark.sql.Dataset.foreach(Dataset.scala:2324)
  ... 54 elided
Caused by: java.lang.NullPointerException
  at method2(<console>:24)
  at $anonfun.apply(<console>:40)
  at $anonfun.apply(<console>:37)
  at scala.collection.Iterator$class.foreach(Iterator.scala:893)
  at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
  at org.apache.spark.rdd.RDD$$anonfun$foreach$$anonfun$apply.apply(RDD.scala:918)
  at org.apache.spark.rdd.RDD$$anonfun$foreach$$anonfun$apply.apply(RDD.scala:918)
  at org.apache.spark.SparkContext$$anonfun$runJob.apply(SparkContext.scala:2062)
  at org.apache.spark.SparkContext$$anonfun$runJob.apply(SparkContext.scala:2062)
  at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
  at org.apache.spark.scheduler.Task.run(Task.scala:108)
  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
  at java.lang.Thread.run(Thread.java:745)

请帮我解决这个问题。

你得到 NullPointerException 因为你试图访问 函数 中的 sparkSession(spark) (method1, method2)。但这不是实际问题。 主要问题是您从 dataframemap 函数内部调用这些函数。这是主要问题。

您无法从 transformations 内访问在 transformations 外定义的变量。所有函数都在 transformations 内部调用,Spark 找不到这些函数内部使用的 spark 变量的任何定义。这是获得 nullPointerException.

的主要原因

这个问题的解决方案是从可以访问 spark 变量的地方调用函数,而不是从 transformation 中调用函数。因此,将您的最后一个 transformation 更改为 action 就可以了

val process = df1.map( row => (row.getString(0), row.getString(1), row.getDouble(2) ) ).collect

process.foreach { x =>
  x._1.trim.toLowerCase match {
    case "method1" => method1(x._2, x._3)
    case "method2" => method2(x._2, x._3)
    case _ => methodn(x._2, x._3)
  }
}

希望回答对你有帮助