java.lang.NoClassDefFoundError: org/apache/spark/TaskOutputFileAlreadyExistException
java.lang.NoClassDefFoundError: org/apache/spark/TaskOutputFileAlreadyExistException
我已经读取了HDFS中的数据。我分析了一下,但是在写的时候出现了这个错误。错误继续
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/TaskOutputFileAlreadyExistException
at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:167)
at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:104)
at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:102)
at org.apache.spark.sql.execution.command.DataWritingCommandExec.doExecute(commands.scala:123)
at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute(SparkPlan.scala:173)
at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery(SparkPlan.scala:211)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:208)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:169)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:110)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:109)
at org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand(DataFrameWriter.scala:828)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId(SQLExecution.scala:100)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:160)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:87)
at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:828)
at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:309)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:293)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:236)
at SparkSQL.SparkHDFS.main(SparkHDFS.java:22)
我的代码
SparkSession sparkSession = SparkSession.builder().appName("FirstSQL").master("local").getOrCreate();
Encoder<MovieModal> movieModalEncoder = Encoders.bean(MovieModal.class);
Dataset<MovieModal> data = sparkSession.read().option("infershema",true)
.option("header",true)
.csv("hdfs://localhost:8020/data/ratings.csv")
.as(movieModalEncoder);
Dataset<Row> groupData = data.groupBy(new Column("movieID")).count();
groupData.write().format("csv").save("hdfs://localhost:8020/var/groupData2.csv");
如果目录已经存在,那么我们需要提供 overwrite
(覆盖现有目录)或 append
(附加到目录)作为 mode 而写作.
试试:
groupData.write().mode("overwrite").format("csv").save("hdfs://localhost:8020/var/groupData2.csv");
我已经读取了HDFS中的数据。我分析了一下,但是在写的时候出现了这个错误。错误继续
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/TaskOutputFileAlreadyExistException
at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:167)
at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:104)
at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:102)
at org.apache.spark.sql.execution.command.DataWritingCommandExec.doExecute(commands.scala:123)
at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute(SparkPlan.scala:173)
at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery(SparkPlan.scala:211)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:208)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:169)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:110)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:109)
at org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand(DataFrameWriter.scala:828)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId(SQLExecution.scala:100)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:160)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:87)
at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:828)
at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:309)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:293)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:236)
at SparkSQL.SparkHDFS.main(SparkHDFS.java:22)
我的代码
SparkSession sparkSession = SparkSession.builder().appName("FirstSQL").master("local").getOrCreate();
Encoder<MovieModal> movieModalEncoder = Encoders.bean(MovieModal.class);
Dataset<MovieModal> data = sparkSession.read().option("infershema",true)
.option("header",true)
.csv("hdfs://localhost:8020/data/ratings.csv")
.as(movieModalEncoder);
Dataset<Row> groupData = data.groupBy(new Column("movieID")).count();
groupData.write().format("csv").save("hdfs://localhost:8020/var/groupData2.csv");
如果目录已经存在,那么我们需要提供 overwrite
(覆盖现有目录)或 append
(附加到目录)作为 mode 而写作.
试试:
groupData.write().mode("overwrite").format("csv").save("hdfs://localhost:8020/var/groupData2.csv");