为什么 spark 应用程序失败并显示 "executor.CoarseGrainedExecutorBackend: Driver Disassociated"?
Why spark application fail with "executor.CoarseGrainedExecutorBackend: Driver Disassociated"?
当我通过 spark-submit 和 spark-sql 执行查询 sql 时,相应的 spark 应用程序总是失败,错误如下:
15/03/10 18:50:52 INFO util.AkkaUtils: Connecting to HeartbeatReceiver: akka.tcp://sparkDriver@slave75:60697/user/HeartbeatReceiver
15/03/10 18:52:08 ERROR executor.CoarseGrainedExecutorBackend: Driver Disassociated [akka.tcp://sparkExecutor@slave79:35643] -> [akka.tcp://sparkDriver@slave75:60697] disassociated! Shutting down.
及以上只是错误之一,我用"yarn logs -application application_1425944520319_8102.log"获取整个应用程序日志并筛选出如下错误:
Line 46: 15/03/10 18:52:08 ERROR executor.CoarseGrainedExecutorBackend: Driver Disassociated [akka.tcp://sparkExecutor@slave09:55156] -> [akka.tcp://sparkDriver@slave75:60697] disassociated! Shutting down.
Line 97: 15/03/10 18:52:08 ERROR executor.CoarseGrainedExecutorBackend: Driver Disassociated [akka.tcp://sparkExecutor@slave09:32852] -> [akka.tcp://sparkDriver@slave75:60697] disassociated! Shutting down.
Line 149: 15/03/10 18:52:08 ERROR executor.CoarseGrainedExecutorBackend: Driver Disassociated [akka.tcp://sparkExecutor@slave09:45654] -> [akka.tcp://sparkDriver@slave75:60697] disassociated! Shutting down.
Line 200: 15/03/10 18:52:08 ERROR executor.CoarseGrainedExecutorBackend: Driver Disassociated [akka.tcp://sparkExecutor@slave10:45702] -> [akka.tcp://sparkDriver@slave75:60697] disassociated! Shutting down.
Line 251: 15/03/10 18:52:08 ERROR executor.CoarseGrainedExecutorBackend: Driver Disassociated [akka.tcp://sparkExecutor@slave10:21596] -> [akka.tcp://sparkDriver@slave75:60697] disassociated! Shutting down.
Line 302: 15/03/10 18:52:08 ERROR executor.CoarseGrainedExecutorBackend: Driver Disassociated [akka.tcp://sparkExecutor@slave10:58845] -> [akka.tcp://sparkDriver@slave75:60697] disassociated! Shutting down.
Line 353: 15/03/10 18:52:08 ERROR executor.CoarseGrainedExecutorBackend: Driver Disassociated [akka.tcp://sparkExecutor@slave13:1697] -> [akka.tcp://sparkDriver@slave75:60697] disassociated! Shutting down.
Line 437: 15/03/10 18:52:06 WARN hdfs.DFSClient: error creating legacy BlockReaderLocal. Disabling legacy local reads.
Line 481: 15/03/10 18:52:06 ERROR executor.Executor: Exception in task 3.0 in stage 0.0 (TID 10)
Line 504: 15/03/10 18:52:08 ERROR executor.CoarseGrainedExecutorBackend: Driver Disassociated [akka.tcp://sparkExecutor@slave13:6289] -> [akka.tcp://sparkDriver@slave75:60697] disassociated! Shutting down.
Line 556: 15/03/10 18:52:08 ERROR executor.CoarseGrainedExecutorBackend: Driver Disassociated [akka.tcp://sparkExecutor@slave14:37070] -> [akka.tcp://sparkDriver@slave75:60697] disassociated! Shutting down.
Line 607: 15/03/10 18:52:08 ERROR executor.CoarseGrainedExecutorBackend: Driver Disassociated [akka.tcp://sparkExecutor@slave14:43424] -> [akka.tcp://sparkDriver@slave75:60697] disassociated! Shutting down.
Line 658: 15/03/10 18:52:08 ERROR executor.CoarseGrainedExecutorBackend: Driver Disassociated [akka.tcp://sparkExecutor@slave15:38083] -> [akka.tcp://sparkDriver@slave75:60697] disassociated! Shutting down.
Line 710: 15/03/10 18:52:08 ERROR executor.CoarseGrainedExecutorBackend: Driver Disassociated [akka.tcp://sparkExecutor@slave15:3106] -> [akka.tcp://sparkDriver@slave75:60697] disassociated! Shutting down.
Line 761: 15/03/10 18:52:08 ERROR executor.CoarseGrainedExecutorBackend: Driver Disassociated [akka.tcp://sparkExecutor@slave15:35533] -> [akka.tcp://sparkDriver@slave75:60697] disassociated! Shutting down.
Line 812: 15/03/10 18:52:08 ERROR executor.CoarseGrainedExecutorBackend: Driver Disassociated [akka.tcp://sparkExecutor@slave16:63207] -> [akka.tcp://sparkDriver@slave75:60697] disassociated! Shutting down.
Line 863: 15/03/10 18:52:08 ERROR executor.CoarseGrainedExecutorBackend: Driver Disassociated [akka.tcp://sparkExecutor@slave16:11250] -> [akka.tcp://sparkDriver@slave75:60697] disassociated! Shutting down.
Line 910: 15/03/10 18:52:09 ERROR executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL 15: SIGTERM
Line 961: 15/03/10 18:52:08 ERROR executor.CoarseGrainedExecutorBackend: Driver Disassociated [akka.tcp://sparkExecutor@slave18:26917] -> [akka.tcp://sparkDriver@slave75:60697] disassociated! Shutting down.
Line 1012: 15/03/10 18:52:08 ERROR executor.CoarseGrainedExecutorBackend: Driver Disassociated [akka.tcp://sparkExecutor@slave18:3058] -> [akka.tcp://sparkDriver@slave75:60697] disassociated! Shutting down.
Line 1063: 15/03/10 18:52:08 ERROR executor.CoarseGrainedExecutorBackend: Driver Disassociated [akka.tcp://sparkExecutor@slave19:1885] -> [akka.tcp://sparkDriver@slave75:60697] disassociated! Shutting down.
Line 1114: 15/03/10 18:52:08 ERROR executor.CoarseGrainedExecutorBackend: Driver Disassociated [akka.tcp://sparkExecutor@slave19:14795] -> [akka.tcp://sparkDriver@slave75:60697] disassociated! Shutting down.
Line 1165: 15/03/10 18:52:08 ERROR executor.CoarseGrainedExecutorBackend: Driver Disassociated [akka.tcp://sparkExecutor@slave19:39794] -> [akka.tcp://sparkDriver@slave75:60697] disassociated! Shutting down.
Line 1216: 15/03/10 18:52:08 ERROR executor.CoarseGrainedExecutorBackend: Driver Disassociated [akka.tcp://sparkExecutor@slave20:19614] -> [akka.tcp://sparkDriver@slave75:60697] disassociated! Shutting down.
Line 1267: 15/03/10 18:52:08 ERROR executor.CoarseGrainedExecutorBackend: Driver Disassociated [akka.tcp://sparkExecutor@slave20:38776] -> [akka.tcp://sparkDriver@slave75:60697] disassociated! Shutting down.
Line 1318: 15/03/10 18:52:08 ERROR executor.CoarseGrainedExecutorBackend: Driver Disassociated [akka.tcp://sparkExecutor@slave21:19231] -> [akka.tcp://sparkDriver@slave75:60697] disassociated! Shutting down.
Line 1370: 15/03/10 18:52:08 ERROR executor.CoarseGrainedExecutorBackend: Driver Disassociated [akka.tcp://sparkExecutor@slave21:18816] -> [akka.tcp://sparkDriver@slave75:60697] disassociated! Shutting down.
Line 1454: 15/03/10 18:52:06 WARN hdfs.DFSClient: error creating legacy BlockReaderLocal. Disabling legacy local reads.
Line 1498: 15/03/10 18:52:06 ERROR executor.Executor: Exception in task 1.0 in stage 0.0 (TID 18)
Line 1524: 15/03/10 18:52:06 ERROR executor.Executor: Exception in task 1.1 in stage 0.0 (TID 28)
Line 1550: 15/03/10 18:52:06 ERROR executor.Executor: Exception in task 1.2 in stage 0.0 (TID 31)
Line 1576: 15/03/10 18:52:06 ERROR executor.Executor: Exception in task 1.3 in stage 0.0 (TID 32)
Line 1602: 15/03/10 18:52:06 ERROR executor.Executor: Exception in task 1.4 in stage 0.0 (TID 33)
Line 1628: 15/03/10 18:52:07 ERROR executor.Executor: Exception in task 1.5 in stage 0.0 (TID 36)
Line 1654: 15/03/10 18:52:07 ERROR executor.Executor: Exception in task 1.6 in stage 0.0 (TID 37)
Line 1680: 15/03/10 18:52:07 ERROR executor.Executor: Exception in task 1.7 in stage 0.0 (TID 39)
Line 1706: 15/03/10 18:52:07 ERROR executor.Executor: Exception in task 1.8 in stage 0.0 (TID 41)
Line 1732: 15/03/10 18:52:07 ERROR executor.Executor: Exception in task 1.9 in stage 0.0 (TID 42)
Line 1755: 15/03/10 18:52:08 ERROR executor.CoarseGrainedExecutorBackend: Driver Disassociated [akka.tcp://sparkExecutor@slave22:24322] -> [akka.tcp://sparkDriver@slave75:60697] disassociated! Shutting down.
Line 1806: 15/03/10 18:52:08 ERROR executor.CoarseGrainedExecutorBackend: Driver Disassociated [akka.tcp://sparkExecutor@slave23:38508] -> [akka.tcp://sparkDriver@slave75:60697] disassociated! Shutting down.
Line 1858: 15/03/10 18:52:08 ERROR executor.CoarseGrainedExecutorBackend: Driver Disassociated [akka.tcp://sparkExecutor@slave24:19707] -> [akka.tcp://sparkDriver@slave75:60697] disassociated! Shutting down.
Line 1909: 15/03/10 18:52:08 ERROR executor.CoarseGrainedExecutorBackend: Driver Disassociated [akka.tcp://sparkExecutor@slave25:33683] -> [akka.tcp://sparkDriver@slave75:60697] disassociated! Shutting down.
Line 1976: 15/03/10 18:52:08 ERROR executor.CoarseGrainedExecutorBackend: Driver Disassociated [akka.tcp://sparkExecutor@slave25:18587] -> [akka.tcp://sparkDriver@slave75:60697] disassociated! Shutting down.
Line 2027: 15/03/10 18:52:08 ERROR executor.CoarseGrainedExecutorBackend: Driver Disassociated [akka.tcp://sparkExecutor@slave26:64531] -> [akka.tcp://sparkDriver@slave75:60697] disassociated! Shutting down.
Line 2078: 15/03/10 18:52:08 ERROR executor.CoarseGrainedExecutorBackend: Driver Disassociated [akka.tcp://sparkExecutor@slave27:23333] -> [akka.tcp://sparkDriver@slave75:60697] disassociated! Shutting down.
Line 2129: 15/03/10 18:52:08 ERROR executor.CoarseGrainedExecutorBackend: Driver Disassociated [akka.tcp://sparkExecutor@slave27:61136] -> [akka.tcp://sparkDriver@slave75:60697] disassociated! Shutting down.
Line 2180: 15/03/10 18:52:08 ERROR executor.CoarseGrainedExecutorBackend: Driver Disassociated [akka.tcp://sparkExecutor@slave27:25118] -> [akka.tcp://sparkDriver@slave75:60697] disassociated! Shutting down.
Line 2231: 15/03/10 18:52:08 ERROR executor.CoarseGrainedExecutorBackend: Driver Disassociated [akka.tcp://sparkExecutor@slave28:16274] -> [akka.tcp://sparkDriver@slave75:60697] disassociated! Shutting down.
Line 2282: 15/03/10 18:52:08 ERROR executor.CoarseGrainedExecutorBackend: Driver Disassociated [akka.tcp://sparkExecutor@slave29:1324] -> [akka.tcp://sparkDriver@slave75:60697] disassociated! Shutting down.
Line 2334: 15/03/10 18:52:08 ERROR executor.CoarseGrainedExecutorBackend: Driver Disassociated [akka.tcp://sparkExecutor@slave29:51664] -> [akka.tcp://sparkDriver@slave75:60697] disassociated! Shutting down.
Line 2385: 15/03/10 18:52:08 ERROR executor.CoarseGrainedExecutorBackend: Driver Disassociated [akka.tcp://sparkExecutor@slave29:38854] -> [akka.tcp://sparkDriver@slave75:60697] disassociated! Shutting down.
Line 2452: 15/03/10 18:52:08 ERROR executor.CoarseGrainedExecutorBackend: Driver Disassociated [akka.tcp://sparkExecutor@slave30:30088] -> [akka.tcp://sparkDriver@slave75:60697] disassociated! Shutting down.
Line 2504: 15/03/10 18:52:08 ERROR executor.CoarseGrainedExecutorBackend: Driver Disassociated [akka.tcp://sparkExecutor@slave30:30778] -> [akka.tcp://sparkDriver@slave75:60697] disassociated! Shutting down.
Line 2556: 15/03/10 18:52:08 ERROR executor.CoarseGrainedExecutorBackend: Driver Disassociated [akka.tcp://sparkExecutor@slave31:52263] -> [akka.tcp://sparkDriver@slave75:60697] disassociated! Shutting down.
Line 2623: 15/03/10 18:52:08 ERROR executor.CoarseGrainedExecutorBackend: Driver Disassociated [akka.tcp://sparkExecutor@slave31:17806] -> [akka.tcp://sparkDriver@slave75:60697] disassociated! Shutting down.
Line 2674: 15/03/10 18:52:08 ERROR executor.CoarseGrainedExecutorBackend: Driver Disassociated [akka.tcp://sparkExecutor@slave32:3251] -> [akka.tcp://sparkDriver@slave75:60697] disassociated! Shutting down.
Line 2725: 15/03/10 18:52:08 ERROR executor.CoarseGrainedExecutorBackend: Driver Disassociated [akka.tcp://sparkExecutor@slave32:17832] -> [akka.tcp://sparkDriver@slave75:60697] disassociated! Shutting down.
Line 2776: 15/03/10 18:52:08 ERROR executor.CoarseGrainedExecutorBackend: Driver Disassociated [akka.tcp://sparkExecutor@slave32:11629] -> [akka.tcp://sparkDriver@slave75:60697] disassociated! Shutting down.
Line 2827: 15/03/10 18:52:08 ERROR executor.CoarseGrainedExecutorBackend: Driver Disassociated [akka.tcp://sparkExecutor@slave33:22629] -> [akka.tcp://sparkDriver@slave75:60697] disassociated! Shutting down.
Line 2911: 15/03/10 18:52:07 WARN hdfs.DFSClient: error creating legacy BlockReaderLocal. Disabling legacy local reads.
如果我没有表达清楚,你可以从https://www.dropbox.com/s/lf50ger18v3ngtb/application_1425944520319_8102.log?dl=0获取日志文件。
slave75网络正常,各节点hosts配置正确。
任何回复都会有所帮助,谢谢!
终于找到原因了。是因为Yarn把executor(容器)干掉了,因为executor是内存开销。只需调高 spark.yarn.driver.memoryOverhead
或 spark.yarn.executor.memoryOverhead
或两者的值。
在我的例子中,我通过增加读取数据到 RDD 的并行任务的数量来解决这个问题
我在使用 worker G.1X 时遇到了类似的问题。将 worker 类型升级到 G.2X 后,我的任务可以完成 2000 万行 DataFrame 的 ETL。
GlueJob:
Type: AWS::Glue::Job
Properties:
GlueVersion: '2.0'
NumberOfWorkers: '3'
WorkerType: 'G.2X'
当我通过 spark-submit 和 spark-sql 执行查询 sql 时,相应的 spark 应用程序总是失败,错误如下:
15/03/10 18:50:52 INFO util.AkkaUtils: Connecting to HeartbeatReceiver: akka.tcp://sparkDriver@slave75:60697/user/HeartbeatReceiver
15/03/10 18:52:08 ERROR executor.CoarseGrainedExecutorBackend: Driver Disassociated [akka.tcp://sparkExecutor@slave79:35643] -> [akka.tcp://sparkDriver@slave75:60697] disassociated! Shutting down.
及以上只是错误之一,我用"yarn logs -application application_1425944520319_8102.log"获取整个应用程序日志并筛选出如下错误:
Line 46: 15/03/10 18:52:08 ERROR executor.CoarseGrainedExecutorBackend: Driver Disassociated [akka.tcp://sparkExecutor@slave09:55156] -> [akka.tcp://sparkDriver@slave75:60697] disassociated! Shutting down.
Line 97: 15/03/10 18:52:08 ERROR executor.CoarseGrainedExecutorBackend: Driver Disassociated [akka.tcp://sparkExecutor@slave09:32852] -> [akka.tcp://sparkDriver@slave75:60697] disassociated! Shutting down.
Line 149: 15/03/10 18:52:08 ERROR executor.CoarseGrainedExecutorBackend: Driver Disassociated [akka.tcp://sparkExecutor@slave09:45654] -> [akka.tcp://sparkDriver@slave75:60697] disassociated! Shutting down.
Line 200: 15/03/10 18:52:08 ERROR executor.CoarseGrainedExecutorBackend: Driver Disassociated [akka.tcp://sparkExecutor@slave10:45702] -> [akka.tcp://sparkDriver@slave75:60697] disassociated! Shutting down.
Line 251: 15/03/10 18:52:08 ERROR executor.CoarseGrainedExecutorBackend: Driver Disassociated [akka.tcp://sparkExecutor@slave10:21596] -> [akka.tcp://sparkDriver@slave75:60697] disassociated! Shutting down.
Line 302: 15/03/10 18:52:08 ERROR executor.CoarseGrainedExecutorBackend: Driver Disassociated [akka.tcp://sparkExecutor@slave10:58845] -> [akka.tcp://sparkDriver@slave75:60697] disassociated! Shutting down.
Line 353: 15/03/10 18:52:08 ERROR executor.CoarseGrainedExecutorBackend: Driver Disassociated [akka.tcp://sparkExecutor@slave13:1697] -> [akka.tcp://sparkDriver@slave75:60697] disassociated! Shutting down.
Line 437: 15/03/10 18:52:06 WARN hdfs.DFSClient: error creating legacy BlockReaderLocal. Disabling legacy local reads.
Line 481: 15/03/10 18:52:06 ERROR executor.Executor: Exception in task 3.0 in stage 0.0 (TID 10)
Line 504: 15/03/10 18:52:08 ERROR executor.CoarseGrainedExecutorBackend: Driver Disassociated [akka.tcp://sparkExecutor@slave13:6289] -> [akka.tcp://sparkDriver@slave75:60697] disassociated! Shutting down.
Line 556: 15/03/10 18:52:08 ERROR executor.CoarseGrainedExecutorBackend: Driver Disassociated [akka.tcp://sparkExecutor@slave14:37070] -> [akka.tcp://sparkDriver@slave75:60697] disassociated! Shutting down.
Line 607: 15/03/10 18:52:08 ERROR executor.CoarseGrainedExecutorBackend: Driver Disassociated [akka.tcp://sparkExecutor@slave14:43424] -> [akka.tcp://sparkDriver@slave75:60697] disassociated! Shutting down.
Line 658: 15/03/10 18:52:08 ERROR executor.CoarseGrainedExecutorBackend: Driver Disassociated [akka.tcp://sparkExecutor@slave15:38083] -> [akka.tcp://sparkDriver@slave75:60697] disassociated! Shutting down.
Line 710: 15/03/10 18:52:08 ERROR executor.CoarseGrainedExecutorBackend: Driver Disassociated [akka.tcp://sparkExecutor@slave15:3106] -> [akka.tcp://sparkDriver@slave75:60697] disassociated! Shutting down.
Line 761: 15/03/10 18:52:08 ERROR executor.CoarseGrainedExecutorBackend: Driver Disassociated [akka.tcp://sparkExecutor@slave15:35533] -> [akka.tcp://sparkDriver@slave75:60697] disassociated! Shutting down.
Line 812: 15/03/10 18:52:08 ERROR executor.CoarseGrainedExecutorBackend: Driver Disassociated [akka.tcp://sparkExecutor@slave16:63207] -> [akka.tcp://sparkDriver@slave75:60697] disassociated! Shutting down.
Line 863: 15/03/10 18:52:08 ERROR executor.CoarseGrainedExecutorBackend: Driver Disassociated [akka.tcp://sparkExecutor@slave16:11250] -> [akka.tcp://sparkDriver@slave75:60697] disassociated! Shutting down.
Line 910: 15/03/10 18:52:09 ERROR executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL 15: SIGTERM
Line 961: 15/03/10 18:52:08 ERROR executor.CoarseGrainedExecutorBackend: Driver Disassociated [akka.tcp://sparkExecutor@slave18:26917] -> [akka.tcp://sparkDriver@slave75:60697] disassociated! Shutting down.
Line 1012: 15/03/10 18:52:08 ERROR executor.CoarseGrainedExecutorBackend: Driver Disassociated [akka.tcp://sparkExecutor@slave18:3058] -> [akka.tcp://sparkDriver@slave75:60697] disassociated! Shutting down.
Line 1063: 15/03/10 18:52:08 ERROR executor.CoarseGrainedExecutorBackend: Driver Disassociated [akka.tcp://sparkExecutor@slave19:1885] -> [akka.tcp://sparkDriver@slave75:60697] disassociated! Shutting down.
Line 1114: 15/03/10 18:52:08 ERROR executor.CoarseGrainedExecutorBackend: Driver Disassociated [akka.tcp://sparkExecutor@slave19:14795] -> [akka.tcp://sparkDriver@slave75:60697] disassociated! Shutting down.
Line 1165: 15/03/10 18:52:08 ERROR executor.CoarseGrainedExecutorBackend: Driver Disassociated [akka.tcp://sparkExecutor@slave19:39794] -> [akka.tcp://sparkDriver@slave75:60697] disassociated! Shutting down.
Line 1216: 15/03/10 18:52:08 ERROR executor.CoarseGrainedExecutorBackend: Driver Disassociated [akka.tcp://sparkExecutor@slave20:19614] -> [akka.tcp://sparkDriver@slave75:60697] disassociated! Shutting down.
Line 1267: 15/03/10 18:52:08 ERROR executor.CoarseGrainedExecutorBackend: Driver Disassociated [akka.tcp://sparkExecutor@slave20:38776] -> [akka.tcp://sparkDriver@slave75:60697] disassociated! Shutting down.
Line 1318: 15/03/10 18:52:08 ERROR executor.CoarseGrainedExecutorBackend: Driver Disassociated [akka.tcp://sparkExecutor@slave21:19231] -> [akka.tcp://sparkDriver@slave75:60697] disassociated! Shutting down.
Line 1370: 15/03/10 18:52:08 ERROR executor.CoarseGrainedExecutorBackend: Driver Disassociated [akka.tcp://sparkExecutor@slave21:18816] -> [akka.tcp://sparkDriver@slave75:60697] disassociated! Shutting down.
Line 1454: 15/03/10 18:52:06 WARN hdfs.DFSClient: error creating legacy BlockReaderLocal. Disabling legacy local reads.
Line 1498: 15/03/10 18:52:06 ERROR executor.Executor: Exception in task 1.0 in stage 0.0 (TID 18)
Line 1524: 15/03/10 18:52:06 ERROR executor.Executor: Exception in task 1.1 in stage 0.0 (TID 28)
Line 1550: 15/03/10 18:52:06 ERROR executor.Executor: Exception in task 1.2 in stage 0.0 (TID 31)
Line 1576: 15/03/10 18:52:06 ERROR executor.Executor: Exception in task 1.3 in stage 0.0 (TID 32)
Line 1602: 15/03/10 18:52:06 ERROR executor.Executor: Exception in task 1.4 in stage 0.0 (TID 33)
Line 1628: 15/03/10 18:52:07 ERROR executor.Executor: Exception in task 1.5 in stage 0.0 (TID 36)
Line 1654: 15/03/10 18:52:07 ERROR executor.Executor: Exception in task 1.6 in stage 0.0 (TID 37)
Line 1680: 15/03/10 18:52:07 ERROR executor.Executor: Exception in task 1.7 in stage 0.0 (TID 39)
Line 1706: 15/03/10 18:52:07 ERROR executor.Executor: Exception in task 1.8 in stage 0.0 (TID 41)
Line 1732: 15/03/10 18:52:07 ERROR executor.Executor: Exception in task 1.9 in stage 0.0 (TID 42)
Line 1755: 15/03/10 18:52:08 ERROR executor.CoarseGrainedExecutorBackend: Driver Disassociated [akka.tcp://sparkExecutor@slave22:24322] -> [akka.tcp://sparkDriver@slave75:60697] disassociated! Shutting down.
Line 1806: 15/03/10 18:52:08 ERROR executor.CoarseGrainedExecutorBackend: Driver Disassociated [akka.tcp://sparkExecutor@slave23:38508] -> [akka.tcp://sparkDriver@slave75:60697] disassociated! Shutting down.
Line 1858: 15/03/10 18:52:08 ERROR executor.CoarseGrainedExecutorBackend: Driver Disassociated [akka.tcp://sparkExecutor@slave24:19707] -> [akka.tcp://sparkDriver@slave75:60697] disassociated! Shutting down.
Line 1909: 15/03/10 18:52:08 ERROR executor.CoarseGrainedExecutorBackend: Driver Disassociated [akka.tcp://sparkExecutor@slave25:33683] -> [akka.tcp://sparkDriver@slave75:60697] disassociated! Shutting down.
Line 1976: 15/03/10 18:52:08 ERROR executor.CoarseGrainedExecutorBackend: Driver Disassociated [akka.tcp://sparkExecutor@slave25:18587] -> [akka.tcp://sparkDriver@slave75:60697] disassociated! Shutting down.
Line 2027: 15/03/10 18:52:08 ERROR executor.CoarseGrainedExecutorBackend: Driver Disassociated [akka.tcp://sparkExecutor@slave26:64531] -> [akka.tcp://sparkDriver@slave75:60697] disassociated! Shutting down.
Line 2078: 15/03/10 18:52:08 ERROR executor.CoarseGrainedExecutorBackend: Driver Disassociated [akka.tcp://sparkExecutor@slave27:23333] -> [akka.tcp://sparkDriver@slave75:60697] disassociated! Shutting down.
Line 2129: 15/03/10 18:52:08 ERROR executor.CoarseGrainedExecutorBackend: Driver Disassociated [akka.tcp://sparkExecutor@slave27:61136] -> [akka.tcp://sparkDriver@slave75:60697] disassociated! Shutting down.
Line 2180: 15/03/10 18:52:08 ERROR executor.CoarseGrainedExecutorBackend: Driver Disassociated [akka.tcp://sparkExecutor@slave27:25118] -> [akka.tcp://sparkDriver@slave75:60697] disassociated! Shutting down.
Line 2231: 15/03/10 18:52:08 ERROR executor.CoarseGrainedExecutorBackend: Driver Disassociated [akka.tcp://sparkExecutor@slave28:16274] -> [akka.tcp://sparkDriver@slave75:60697] disassociated! Shutting down.
Line 2282: 15/03/10 18:52:08 ERROR executor.CoarseGrainedExecutorBackend: Driver Disassociated [akka.tcp://sparkExecutor@slave29:1324] -> [akka.tcp://sparkDriver@slave75:60697] disassociated! Shutting down.
Line 2334: 15/03/10 18:52:08 ERROR executor.CoarseGrainedExecutorBackend: Driver Disassociated [akka.tcp://sparkExecutor@slave29:51664] -> [akka.tcp://sparkDriver@slave75:60697] disassociated! Shutting down.
Line 2385: 15/03/10 18:52:08 ERROR executor.CoarseGrainedExecutorBackend: Driver Disassociated [akka.tcp://sparkExecutor@slave29:38854] -> [akka.tcp://sparkDriver@slave75:60697] disassociated! Shutting down.
Line 2452: 15/03/10 18:52:08 ERROR executor.CoarseGrainedExecutorBackend: Driver Disassociated [akka.tcp://sparkExecutor@slave30:30088] -> [akka.tcp://sparkDriver@slave75:60697] disassociated! Shutting down.
Line 2504: 15/03/10 18:52:08 ERROR executor.CoarseGrainedExecutorBackend: Driver Disassociated [akka.tcp://sparkExecutor@slave30:30778] -> [akka.tcp://sparkDriver@slave75:60697] disassociated! Shutting down.
Line 2556: 15/03/10 18:52:08 ERROR executor.CoarseGrainedExecutorBackend: Driver Disassociated [akka.tcp://sparkExecutor@slave31:52263] -> [akka.tcp://sparkDriver@slave75:60697] disassociated! Shutting down.
Line 2623: 15/03/10 18:52:08 ERROR executor.CoarseGrainedExecutorBackend: Driver Disassociated [akka.tcp://sparkExecutor@slave31:17806] -> [akka.tcp://sparkDriver@slave75:60697] disassociated! Shutting down.
Line 2674: 15/03/10 18:52:08 ERROR executor.CoarseGrainedExecutorBackend: Driver Disassociated [akka.tcp://sparkExecutor@slave32:3251] -> [akka.tcp://sparkDriver@slave75:60697] disassociated! Shutting down.
Line 2725: 15/03/10 18:52:08 ERROR executor.CoarseGrainedExecutorBackend: Driver Disassociated [akka.tcp://sparkExecutor@slave32:17832] -> [akka.tcp://sparkDriver@slave75:60697] disassociated! Shutting down.
Line 2776: 15/03/10 18:52:08 ERROR executor.CoarseGrainedExecutorBackend: Driver Disassociated [akka.tcp://sparkExecutor@slave32:11629] -> [akka.tcp://sparkDriver@slave75:60697] disassociated! Shutting down.
Line 2827: 15/03/10 18:52:08 ERROR executor.CoarseGrainedExecutorBackend: Driver Disassociated [akka.tcp://sparkExecutor@slave33:22629] -> [akka.tcp://sparkDriver@slave75:60697] disassociated! Shutting down.
Line 2911: 15/03/10 18:52:07 WARN hdfs.DFSClient: error creating legacy BlockReaderLocal. Disabling legacy local reads.
如果我没有表达清楚,你可以从https://www.dropbox.com/s/lf50ger18v3ngtb/application_1425944520319_8102.log?dl=0获取日志文件。
slave75网络正常,各节点hosts配置正确。 任何回复都会有所帮助,谢谢!
终于找到原因了。是因为Yarn把executor(容器)干掉了,因为executor是内存开销。只需调高 spark.yarn.driver.memoryOverhead
或 spark.yarn.executor.memoryOverhead
或两者的值。
在我的例子中,我通过增加读取数据到 RDD 的并行任务的数量来解决这个问题
我在使用 worker G.1X 时遇到了类似的问题。将 worker 类型升级到 G.2X 后,我的任务可以完成 2000 万行 DataFrame 的 ETL。
GlueJob:
Type: AWS::Glue::Job
Properties:
GlueVersion: '2.0'
NumberOfWorkers: '3'
WorkerType: 'G.2X'